Intelligent agents that can handle human language play a growing role in personalized, ubiquitous computing and the everyday use of devices. Agents need to be able to communicate and collaborate with humans in ways that are seamless and natural, and to be able to learn new behaviors, concepts, and relationships as first-class operations. In other words, our devices need to be able to converse with us.
In this project, Microsoft Research AI teams are interested in questions like: How can machines discuss or answer questions about an image, story, or video? How can they acquire new knowledge within a conversation, and surface their own uncertainties? How can they learn to ask the right kinds of questions to drive interactions forward in useful ways? How can a machine learn to understand and manipulate its surroundings through dialog? How can a user instruct a device to conduct a new task through conversation?
Grounded Conversation: One important theme in this research is that of grounding in the real world. A hallmark of human language is that it can refer to people, places, objects, actions, and real-world concepts–that is, it can be grounded in the physical world. We use natural language to understand, explain and manipulate the world around us. Our intelligent agents need to be able to, as well.
Interactive Learning in Dialog: A second–and related–theme in this research is interactive learning within a dialog. People are constantly expanding their set of grounded references to entities, and forming new relationships among entities. We are exploring methods for enabling conversational machines to acquire new knowledge and relationships, identify and correct misunderstandings, and tolerate—and resolve—ambiguities inherent in natural language and the real world.
To address both of these themes requires bringing the physical world or virtual environment to bear on the conversation. Much external information is relevant, from the mood and emotional state of the user to the time of day, weather, ambient temperature, and even a user’s vital signs from a monitoring device. Machine state and application (or game) state all affect how the conversation should proceed.
Microsoft teams working on grounded conversation projects span disciplines and subfields, including natural language processing, image processing, speech recognition, signal processing, search and information retrieval, computer/human interactive learning, deep learning, and reinforcement learning. Advances in this area will yield exciting new AI competencies for conversational systems, robots, and other emerging human-computer interfaces.