Toward AI that operates in the real world

Published May 29, 2017

Share this page

By Ashish Kapoor (opens in new tab), Microsoft Research

It’s an exciting time to be a machine intelligence researcher. Recent successes in machine learning (ML) and artificial intelligence (AI), which span from achieving human-level parity in speech recognition to beating world champions in board games, indicate the promise of the recent methods. Most of these successes, however, are limited to agents that live and operate in the closed world of software. Such closed-world operations provide two significant advantages to these AI agents. First, these agents need to excel only with respect to the one task they are designed for—an intelligent agent playing a board game needs only to reason about the next best move to make and nothing else. Second, most of these systems enjoy the luxury to collect annotated, near-infinite training data, either from the tediously labeled past experience or via techniques such as self-play.

Now let’s consider robots, Internet of Things (IoT) devices, and autonomous vehicles that live and operate in the real world and beyond the narrow assumptions of the closed-world paradigm. Not only do these devices have to excel at their primary task, they also have to live in an open world with all kinds of unmodeled exogenous phenomenon and threats. Further, these systems need to adapt and learn with a minimal amount of training. Many of the recently successful paradigms, such as reinforcement learning, learning-by-demonstration, and transfer learning, are particularly challenging to apply on these devices given the need for a large amount of training data for these techniques.

While there have been examples of integrative AI, where an AI system might be realized via several individual components coming together, there is a need to explore the basic principles that might enable a core fabric to build adaptive and intelligent systems that work in the real world.

A snapshot from AirSim shows an aerial vehicle flying in an urban environment, training for real-world AI

A snapshot from AirSim shows an aerial vehicle ﬂying in an urban environment. The inset shows depth, object segmentation and front camera streams generated in real time.

At Microsoft Research, we are pursuing an ambitious agenda in the realm of robotics and cyberphysical systems, where the goal is to explore and reveal a unifying algorithmic and technological fabric that would enable such real-world artificial intelligence. Our belief is that there are three key aspects that need to be addressed at a fundamental level in order to take the next big leap in building AI agents for the real world. These three aspects are structure, simulation, and safety, which we describe below:

Structure: One way to address the data scarcity issue is to use the structure, both statistical and logical, of the world. The order in the environment (such as traffic rules, laws of nature, and our social circle) can be very helpful in collapsing the uncertainty that an agent faces while operating in the real world. For example, our recent work on No-Regret Replanning Under Uncertainty (opens in new tab), shows how existing robotic path planning algorithms can exploit the statistical structure of winds in order to determine the near-optimal path to follow even when the data is scarce.

The flying quadrotor learns to avoid obstacles autonomously for different environments, practicing real-world AI

This figure shows the ability to generalize to different structured environments. The flying quadrotor, using the same underlying mechanism, learns to avoid obstacles autonomously for different environments.

While the traditional approaches have encoded such relationships as a statistical or a logical model, the ability to truly operate in the wild world instead needs mechanisms to efficiently infer such relationships on their own. Our recent work on Learning to Explore with Imitation (opens in new tab) is one big step in that direction, where the agent learns a policy while implicitly learning about the structure in the world. A key advantage of this approach is the fact that no explicit encoding about the structural knowledge is required, thereby allowing the algorithm to generalize across multiple problem domains. We further analyze the theoretical foundations of this idea of solving Markov Decision Processes (MDPs) with imitation learning in our upcoming RSS paper (opens in new tab).

Simulation: Simulating the real world itself is an AI-complete task, but even an approximation of reality will serve as a fundamental building block in this ambitious quest. Our popular open-sourced simulation project (opens in new tab) aims to bridge such simulation-to-reality gaps. Not only are we using simulation to generate meaningful training data, but we also consider it an integral part of the AI agent as a portal to execute and verify all the actions they plan to take in the uncertain world. This is akin to how human beings might stop to think and simulate the consequences of their actions before acting in certain difficult situations. The AI agents need the ability to be introspective and learn from the virtual thought process. Such execution traces of these plans or policies are instrumental for verifying the effectiveness and correctness of the planned trajectory. Key to success in this fundamental problem is the ability to transfer all the learnings and inferences that happen in simulation to the real world. We continue to invest in and explore this exciting realm of sim-to-real AI.

The architecture of the simulation system that depicts the core components and their interactions.

Safety: It’s paramount to consider safety, both from the perspective of the agent as well as the living being and the environment, when an AI agent decides to execute actions. One possible cause of unsafe behavior is machine learning and perception systems that fail to collapse the uncertainty completely in the environment. It is well known that ML systems are never fool-proof, consequently our recent work on Fast Second-order Cone Programming for Safe Mission Planning (opens in new tab) aims to reason about possible safe actions to take, in real time. The core idea is to exploit the geometric structure of uncertainty arising due to the ML methods and then optimize for safe margins via Wolfe’s algorithm, which is both fast and memory-efficient. Similarly, these ideas are further extended to derive safe, bandit-based methods (opens in new tab) for decision making. There are many aspects of safety, including cybersecurity, verification, and testing, that we are exploring in collaboration with various colleagues.

Real-world AI, a hypothetical scenario where a robot needs to avoid an obstacle.

We show a hypothetical scenario where a robot needs to avoid an obstacle. The imperfect sensing provides a system with a belief about the safe areas to travel (blue and the red lines). The robot decides to consider all the uncertainty in the inferences and determines a trajectory (black) that is safe with a very high probability. The graph on the left shows that the proposed methodology (Wolfe’s algorithm) is very efficient, enabling real-time decisions.

I’d like to conclude with the example of first-person-view (FPV) drone racing events that are increasingly getting popular. These races entail a drone operator sitting on chair with display glasses that project all the imagery captured from the camera mounted on an extremely agile racer drone. What is incredible is that the drone operator is able to maneuver the craft through seemingly impossible indoor environments while maintaining a very high speed. The three-pound mass sitting between the ears of the operator is able to convert the high-dimensional video feed to a four-dimension remote control signal to guide the vehicle with amazing efficiency and safety. An AI intelligent agent that mimics and possibly beats a human brain in such tasks will embody the three aspects of structure, simulation, and safety.

Collaborators: Debadeepta Dey (opens in new tab), Prateek Jain (opens in new tab), Chris Lovett, Siddhartha Prakash, Gireeja Ranade (opens in new tab), Shital Shah (opens in new tab)

Contributing interns: Sanjiban Choudhury (CMU), Niteesh Sood (MSR India), Wen Sun (CMU), Kai Zhong (University of Texas at Austin)

Relevant papers:

No-Regret Replanning Under Uncertainty
(opens in new tab)Wen Sun, Niteesh Sood, Debadeepta Dey, Gireeja Ranade, Siddharth Prakash, Ashish Kapoor
International Conference on Robotics and Automation (ICRA) 2017
Learning to Gather Information via Imitation
(opens in new tab)Sanjiban Choudhury, Ashish Kapoor, Gireeja Ranade, Debadeepta Dey
International Conference on Robotics and Automation (ICRA) 2017
Fast Second-order Cone Programming for Safe Mission Planning
(opens in new tab)Kai Zhong, Prateek Jain and Ashish Kapoor
International Conference on Robotics and Automation (ICRA) 2017
Adaptive Information Gathering via Imitation Learning (opens in new tab)
Sanjiban Choudhury, Ashish Kapoor, Gireeja Ranade, Sebastian Scherer, Debadeepta Dey
Robotics: Science and Systems (RSS) 2017
Risk-Aware Algorithms for Adversarial Contextual Bandits
(opens in new tab)Wen Sun, Debadeepta Dey, Ashish Kapoor
International Conference on Machine Learning (ICML) 2017
AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles
(opens in new tab)Shital Shah, Debadeepta Dey, Chris Lovett and Ashish Kapoor