This project aims to enable people to converse with their devices. We are trying to teach devices to engage with humans using human language in ways that appear seamless and natural to humans. Our research focuses on statistical methods by which devices can learn from human-human conversational interactions and can situate responses in the verbal context and in physical or virtual environments.
Natural and Engaging
Agents that process human language will play a growing role in the future of personalized, ubiquitous computing and the everyday use of devices. To be successful, they will need to win trust and willingness to engage on the part of their human users. Agents will need to partake in a full range of conversational interactions from casual chitchat to helping users code up applications, and to do so in ways that seem natural and engaging so that humans are willing to continue to interact.
Learning to Converse
To achieve human-like performance, the technology must be data-driven. The present generation of computer dialog systems is almost exclusively hand-crafted, rule-based, or templatic, perhaps with a statistical component to help decide what kind of response is required. Such systems are not particularly robust. They don’t scale easily, extend well to new domains, or generalize to new languages. Instead, we want the machines to learn to how to interact on the basis of human-to-human conversational interactions found on the web and elsewhere. Accordingly, we take our cue from modern statistical machine translation and attempt to learn conversational patterns of mapping between message and response.
A major focus of our research is context awareness. By this we mean awareness not only of the verbal content of the preceding exchange, but also of the prior history of interactions by the agent with the user; the mood and emotional state of the user together with preferences and other profile data; and also the physical and virtual environments. The physical environment may encompass location, time of day, weather, ambient temperature, and even the user’s vital signs from a monitoring device. Virtual environments may include machine state, game state and application state. All of these–and more–will eventually need to be brought to bear in order for the agent to generate coherent responses.
Since statistical machine translation techniques based on phrase tables alone cannot hope to capture the richness of the contexts that might be required, we are exploring the use of neural network models as a promising approach to generating context-sensitive responses. Our paper at NAACL 2015 was an important first step in this direction.
Other aspects of our research agenda include adaptive modeling of the persona of both agent and human user; automatic generation of conversations around images; and acquisition of conversational interactions to generate automatically code that will modify or control applications on a device or machine.
Alessandro Sordoni (2014 MSR Intern)
Jian-Yun Nie (2014 Summer Visiting Researcher)
Yangfeng Ji (2014 MSR Intern)
Jiwei Li (2015 MSR Intern)
Georgios P. Spithourakis (2015 MSR Intern)
Alan Ritter (2015 Visiting Consultant)