Conversations in the Crowd: Collecting Data for Task-Oriented Dialog Learning

In Proceedings of the Human Computation Workshop on Scaling Speech and Language Understanding and Dialog through Crowdsourcing at HCOMP 2013. |

A major challenge in developing dialog systems is obtaining realistic data to train the systems for specific domains. We study the opportunity for using crowdsourcing methods to collect dialog datasets. Specifically, we introduce ChatCollect, a system that allows researchers to collect conversations focused around definable tasks from pairs of workers in the crowd. We demonstrate that varied and in-depth dialogs can be collected using this system, then discuss ongoing work on creating a crowd-powered system for parsing semantic frames. We then discuss research opportunities in using this approach to train and improve automated dialog systems in the future.