2011 Dialog System Technology Challenge (DSTC)

Background and motivation

In dialog systems, “state tracking” – sometimes also called “belief tracking” – refers to accurately estimating the user’s goal as a dialog progresses. Accurate state tracking is desirable because it provides robustness to errors in speech recognition, and helps reduce ambiguity inherent in language within a temporal process like dialog. Dialog state tracking is an important problem for both traditional uni-modal dialog systems, as well as speech-enabled multi-modal dialog systems on mobile devices, on tablet computers, and in automobiles.

Recently, a host of models have been proposed for dialog state tracking. However, comparisons among models are rare, and different research groups use different data from disparate domains. Moreover, there is currently no common dataset which enables off-line dialog state tracking experiments, so newcomers to the area must first collect dialog data, which is expensive and time-consuming, or resort to simulated dialog data, which can be unreliable. All of these issues hinder advancing the state-of-the-art.

Challenge format

In this challenge, participants will use a provided set of labeled dialogs to develop a dialog state tracking algorithm. Algorithm will then be evaluated on a common set of held-out dialogs, to enable comparisons.

The data for this challenge will be taken from the Spoken Dialog Challenge (Black et al, 2011), which consists of human/machine spoken dialogs with real users (not usability subjects). Before the start of the challenge, a draft of the labeling guide and evaluation measures will be published, and comments will be invited from the community. The organizers will then perform the labeling.

At the start of the challenge – the development phase – participants will be provided with a training set of transcribed and labeled dialogs. Participants will also be given code that implements the evaluation measurements. Participants will then have several months to optimize their algorithms.

At the end of the challenge, participants will be given an untranscribed and unlabeled test set, and a short period to run their algorithm against the test set. Participants will submit their algorithms’ output to the organizers, who will then perform the evaluation. After the challenge, the test set transcriptions and labels will be made public.

Results of the evaluation will be reported to the community. In this challenge, the identities of participants will not be made public in any written results by the organizers or participants, except that participants may identify themselves (only) in their own written results.

Schedule

1 July 2012 Beginning of comment period on labeling and evaluation metrics
4-6 July 2012 YRRSDS and SigDial in Korea: announcement of comment period
17 August 2012 End of comment period on labeling and evaluation metrics
31 August 2012 Evaluation metrics and labeling guide published; labeling begins
10 December 2012 Labeling ends; data available; challenge begins (14 weeks)
3-5 December 2012 IEEE SLT 2012 (Miami): Information session, 5 Dec 3 PM
22 March 2013 Final system due; evaluation begins (1 week)
29 March 2013 Evaluation output due to organizers
5 April 2013 Results sent to teams
3 May 2013 SigDial paper deadline; write up (4 weeks)
August 2013 SigDial 2013

Organizers

Jason D. Williams, Microsoft Research, USA (chair)
Alan Black, Carnegie Mellon University, USA
Deepak Ramachandran, Honda Research Institute, USA
Antoine Raux, Honda Research Institute, USA

Advisory committee

Daniel Boies, Microsoft, Canada
Paul Crook, Microsoft, USA
Maxine Eskenazi, Carnegie Mellon University, USA
Milica Gasic, University of Cambridge, UK
Dilek Hakkani-Tur, Microsoft, USA
Helen Hastie, Heriot Watt University, UK
Kee-Eung Kim, KAIST, Korea
Ian Lane, Carnegie Mellon University, USA
Sungjin Lee, Carnegie Mellon University, USA
Teruhisa Misu, NICT, Japan
Olivier Pietquin, SUPELEC, France
Joelle Pineau, McGill University, Canada
Blaise Thomson, University of Cambridge, UK
David Traum, USC Institute for Creative Technologies, USA
Luke Zettlemoyer, University of Washington, USA

More information

The Dialog State Tracking Challenge. Jason D. Williams, Antoine Raux, Deepak Ramachadran, and Alan Black. Proceedings of the SIGDIAL 2013 Conference, Metz, France, August 2013.

Dialog state tracking challenge handbook. Antoine Raux, Deepak Ramachandran, Alan Black, and Jason D. Williams. Technical Report.

A belief tracking challenge task for spoken dialog systems. Jason D. Williams. Proceedings NAACL HLT 2012 Workshop on Future directions and needs in the Spoken Dialog Community: Tools and Data, June 2012.

Spoken Dialog Challenge 2010: Comparison of Live and Control Test Results. Alan W Black, Susanne Burger, Alistair Conkie, Helen Hastie, Simon Keizer, Oliver Lemon, Nicolas Merigaud, Gabriel Parent, Gabriel Schubiner, Blaise Thomson, Jason D. Williams, Kai Yu, Steve Young, and Maxine Eskenazi. Proceedings SIGDIAL, Portland, Oregon, USA, 2011.