Microsoft Research Blog

Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

ReDial: Recommendation dialogs for bridging the gap between chit-chat and goal-oriented chatbots

November 30, 2018 | By Microsoft blog editor


Chatbots come in many flavors, but most can be placed in one of two categories: goal-oriented chatbots and chit-chat chatbots. Goal-oriented chatbots behave like a natural language interface for function calls, where the chatbot asks for and confirms all required parameter values and then executes a function. The Cortana chat interface is a classic example of a goal-directed chatbot. For example, you can ask about the weather for a specific location or let Cortana walk you through creating a new calendar entry.

Chit-chat bots primarily keep the user engaged. This can involve throwing in random trivia, puns, or even memes. These bots often artfully avoid going into depth on any specific topic; their conversation has no goal other than maintaining the conversation itself.

Most conversations operate somewhere between the two chatbot types. For example, when we talk to a person to get a recommendation, we expect that person to get to know us a bit before making a suggestion. A recommender you’d trust does more than just check boxes; he or she establishes trust by demonstrating a level of knowledge on a topic, but the exchange should still have a clearly defined topic and goal.

To study this setting—named conversation recommendation—a team of researchers working at Polytechnique Montréal, Mila – Quebec AI Institute, Microsoft Research Montréal, and Element AI collected a dataset of more than 11000 dialogs based on crowd workers recommending movies to each other. The researchers used a trick to simultaneously collect labels, asking the crowd workers to tag all movie mentions in their dialogues using a dropdown search that appeared when the worker typed the `@’ character. In a questionnaire following the dialogue, both participants were asked questions about the @-mentioned movies, on which the participants had to agree. This cross-checked self-labeling resulted in high-quality labels for the dialogues that the researchers then used to train the components of a chatbot.

In their baseline model, the “social” component is covered by an extension of the popular hierarchical recurrent encoder-decoder (HRED) model, whereas the recommendation engine is a denoising auto-encoder pretrained on the MovieLens dataset. A recommendation engine typically suggests movies based on the previous preferences of its users. The researchers leveraged the collected labels to train a sentiment analysis mechanism that predicts how much the recommendation seeker liked a mentioned movie from the dialogue. This constitutes a novel form of sentiment analysis, because in contrast to, say product reviews, the sentiment in a dialogue can be expressed in question/answer form over multiple speaker turns.

Finally, both systems can be combined by learning a “gating” mechanism in the decoder of the HRED model. The gate decides whether to output a word from the chatbot vocabulary or a movie title taken from the recommender.

While their baseline implementation is by no means perfect, it highlights the possibilities and challenges arising from the dataset collection scheme. Future models will profit from incorporating question-answering mechanisms and external data sources into the model to support conversations about actors, movie genres or topics. The recommender module will improve by, in addition to ratings, taking the expressed preferences (which could be very much mood-dependent) into account.

The research paper will be presented at this year’s NeurIPS conference. You can find it and the dataset at the project’s website.

Up Next

Artificial intelligence, Human language technologies

Talking with machines with Dr. Layla El Asri

Episode 64, February 20, 2019 - Dr. Layla El Asri talks about the particular challenges she and other scientists face in building sophisticated dialogue systems that lay the foundation for talking machines. She also explains how reinforcement learning, in the form of a text game generator called TextWorld, is helping us get there, and relates a fascinating story from more than fifty years ago that reveals some of the safeguards necessary to ensure that when we design machines specifically to pass the Turing test, we design them in an ethical and responsible way.

Microsoft blog editor

hands holding holographic brain node image

Artificial intelligence, Human language technologies

Guidelines for human-AI interaction design

The increasing availability and accuracy of AI has stimulated uses of AI technologies in mainstream user-facing applications and services. Along with opportunities for infusing valuable AI services in a wide range of products come challenges and questions about best practices and guidelines for human-centered design. A dedicated team of Microsoft researchers addressed this need by […]

Saleema Amershi

Principal Researcher

Artificial intelligence, Computer vision, Graphics and multimedia

ChatPainter: Improving text-to-image generation by using dialogue

Generating realistic images from a text description is a challenging task for a bot. A solution to this task has potential applications in the video game and image editing industries, among many others. Recently, researchers at Microsoft and elsewhere have been exploring ways to enable bots to draw realistic images in defined domains, such as […]

Microsoft blog editor