Predicting User Satisfaction with Intelligent Assistants

Julia Kiselava; Kyle Williams; Jiepu Jiang; Ahmed Awadallah; Imed Zitouni; Aidan C. Crook; Tasos Anastasakos

Predicting User Satisfaction with Intelligent Assistants

Julia Kiselava ,
Kyle Williams ,
Jiepu Jiang ,
Ahmed Awadallah ,
Imed Zitouni ,
Aidan C. Crook ,
Tasos Anastasakos

International Conference on Research and Development in Information Retrieval (SIGIR '16) | July 2016

Published by ACM

Download BibTex

There is a rapid growth in the use of voice-controlled intelligent personal assistants on mobile devices, such as Microsoft’s Cortana, Google Now, and Apple’s Siri. They significantly change the way users interact with search systems, not only because of the voice control use and touch gestures, but also due to the dialogue-style nature of the interactions and their ability to preserve context across different queries. Predicting success and failure of such search dialogues is a new problem, and an important one for evaluating and further improving intelligent assistants. While clicks in web search have been extensively used to infer user satisfaction, their significance in search dialogues is lower due to the partial replacement of clicks with voice control, direct and voice answers, and touch gestures.

In this paper, we propose an automatic method to predict user satisfaction with intelligent assistants that exploits all the interaction signals, including voice commands and physical touch gestures on the device. First, we conduct an extensive user study to measure user satisfaction with intelligent assistants, and simultaneously record all user interactions. Second, we show that the dialogue style of interaction makes it necessary to evaluate the user experience at the overall task level as opposed to the query level. Third, we train a model to predict user satisfaction, and find that interaction signals that capture the user reading patterns have a high impact: when including all available interaction signals, we are able to improve the prediction accuracy of user satisfaction from 71% to 81% over a baseline that utilizes only click and query features.