Return to Blog Home

Microsoft Research Blog

The term ‘Artificial Intelligence’ may have been coined way back in the 1950s, but we have yet to see the types of machines described in books and films. In our daily lives and workplaces, we aspire to having machines that can help us get things done and make the most of our time.

For an artificial agent to work and interact with humans, it will need cognitive skills such as planning, critical analysis, reasoning, and decision making. Our research focuses on some core foundations of these complex and challenging capabilities: gathering knowledge about the world, learning how to interact with the world, and learning how to communicate findings to humans.

Learning to learn

Learning to learn requires the ability to generalize one’s behavior across different tasks. This relies on efficiently decomposing tasks into subtasks, which is called compositional learning. At the Microsoft Research Montreal lab, we have been working on compositional learning within the reinforcement learning framework. Our model is inspired by the different levels of decision making that happen in the human mind. As a matter of fact, human beings are very efficient at decomposing tasks into small generalizable subtasks.

Spotlight: Academic programs

Working with the academic community

Read more about grants, fellowships, events and other ways to connect with Microsoft research.

Take the example of learning to drive a car. The first few driving lessons will leave you exhausted, because you will learn many new maneuvers as well as how to associate each maneuver with a given situation. After a while, with enough experience, you will be able to perform some maneuvers without being consciously aware of it (for example, starting the car on a flat road, braking in time for a red light, or changing gears). These sequences of actions become reflexes, and you do not need to think about them anymore because they do not change across different driving routes; you can always perform them the same way. This makes driving a much less tiring experience and leaves your conscious mind free to deal with the unexpected.

We’re developing an approach, named Separation of Concerns, which emulates this process: we decompose tasks into smaller subtasks that can be learned efficiently and which recur across many scenarios. This lets the AI agent focus on what’s new and adapt quickly to unknown environments.

Learning to perceive

The key for an AI agent to know how to transfer its skills to an unknown environment lays in the agent’s perception of the environment. The agent must learn to recognize what is familiar and what is not. This is the subject of another stream of research: information seeking. An information-seeking agent learns what to observe in the environment to perform a given task. In other words, it learns to recognize salient features that will help it achieve its goals.

This is also inspired from the living world. The first living creatures had a limited ability to perceive the world through coarse senses. They could only react to dangerous situations. The development of nervous systems then made learning possible: creatures could associate sensory inputs and form a more precise perception of the world. With time, creatures developed more localized feedback such as pain. From the ability only to react to danger, creatures evolved to being able to anticipate it before damage was done. To do this, they had to learn to recognize the features of the world that correlated with the sensory output (the presence or absence of pain). Because of the number of sensors, the search space in the case of animals is gigantic. It was thus crucial to learn which features to ignore and which ones to consider for a given feedback.

This is exactly what our information-seeking agents learn: they learn to focus on the most informative features of the environment to perform a task, for instance avoiding dangerous situations. This work will take AI agents one step closer to autonomy. An example of a more specific application is medical diagnosis: it is critical in this setting only to run the tests that are correlated with a patient’s symptoms.

Perception and judgement

The famous Myers-Briggs personality test was developed to make theories of psychological types understandable and useful in people’s lives. While classifying human psychology is an infinitely complex field, this test has become popular. One of its dimensions classifies people along a spectrum from Perceiving to Judging. A perceiving personality is one that favors spontaneity and prefers reacting to new observations rather than planning ahead. At the other end of the spectrum, a judging personality tends to be organized, make decisions, and plan. Of course these are only preferences and every human being needs to combine perception and judgement.

This is an interesting analogy to AI research. The former sections describe how AI agents have improved greatly their ability to perceive their environments.

What about judgment? Judgment is linked to the feedback obtained from the environment: if I have had more success following procedure A than procedure B, I am more likely to follow procedure A in the future. This is the reinforcement learning setting, which is at the core of the separation of concerns and information-seeking frameworks. From a temporal point of view, judgment comes after perception: once you have observed the situation, you need to decide on a chain of operations to attain your goal. You might mobilize skills that you have previously learned and apply them to this context but you might also need to adapt to an unknown setting and try actions without knowing their outcome.

Judgment is only separate from perception from a temporal point of view. Indeed, as discussed previously, perception and judgment are tightly linked and they are learned simultaneously. To get closer to the efficiency of living creatures, research in AI has shifted from separating perception and judgment to learning them simultaneously. This has been made possible by advances in deep learning, which enable us to build end-to-end models that can tailor their representation of the world to the task at hand. Therefore, we develop agents that learn where to look in order to optimize their skills for a given task, and agents that learn how to decompose a task to learn to perform it more efficiently.

Learning to communicate

For AI agents to work with humans and share the skills they learn, they will need to learn to communicate. Language is the favorite and most natural communication channel between human beings. We will thus need to teach language to AI agents.

Language is the most precise tool that we have to transmit our thoughts and feelings to others. Yet, philosophers like Brice Parain struggled with language because it actually lacks precision when it comes to expressing anything complex. As Sartre wrote, there is “backlash in the gears of language”; if we try to say something, the meaning inferred by the person listening to us is likely to differ from the one intended.

Parain concluded at some point of his research that orders were the only efficient way of communicating, that we could only be understood correctly if we specified a precise action for someone else to perform. Words could only make sense in a tightly ordered situation, with one giving the order and the other receiving, executing. Another way to see it is that one defines what the word means, how it relates to an action, and the other must adopt this specific definition.

Then, Parain’s journey through language took him back to an existentialist conception of language. Words shape us as we speak them because every time we use a new word, we have to invent its meaning, we link it to our own experience and perception.

Language is an imprecise tool that shapes us as we use it but still, it is our best way of knowing and being known. The problems in human-human communication are even greater in human-machine dialogue. For technological reasons, it is even more difficult to be understood by a machine than by another human being. And just like in human-human dialogue, it is often easier to use a task-oriented dialogue system and give a precise description of the task we would like to see accomplished than try to have complex thoughts understood as they were intended. Machines are still mostly engaging in goal-oriented dialogues where the user needs the machine’s help to accomplish a task, e.g., send emails or help with troubleshooting.

We are focusing on such action-based dialogue for now, because even in this simple setting, there are still many open problems. In addition, as Parain noticed, this setting limits miscommunication issues and enables us to experiment with models and algorithms in a relatively controlled setting. Our work at the Montreal lab focuses on using memory in a dialogue to follow a human being’s chain of thought, on communicating results through language (e.g., results from searching a database), and on the art of efficient information exchange through dialogue.

We can hope that soon machines will also be capable of a sophisticated use of the language that shapes us, that is not only based on actions but is playful and ambiguous, to the point that they will be able to practice such things as subtlety and sarcasm.


The research at the Microsoft Research Montreal lab is tackling major milestones to create AI agents that can efficiently and autonomously understand the world, look for information, and communicate their findings to humans.

Layla El Asri at ML Conf New York

Research Scientist Layla El Asri gave a presentation on this subject at ML Conf New York. A video of her presentation is available below:

Français English