Dialog State Tracking Challenge

About

The Dialog State Tracking Challenge (DSTC) is an on-going series of research community challenge tasks. Each task released dialog data labeled with dialog state information, such as the user’s desired restaurant search query given all of the dialog history up to the current turn. The challenge is to create a “tracker” that can predict the dialog state for new dialogs. In each challenge, trackers are evaluated using held-out dialog data.

DSTC1

DSTC1 used human-computer dialogs in the bus timetable domain. Results were presented in a special session at SIGDIAL 2013. DSTC1 was organized by Jason D. Williams, Alan Black, Deepak Ramachandran, Antoine Raux.

For DSTC1 data, and more information, see “DSTC1” tabs above.

DSTC2 and DSTC3

DSTC2/3 used human-computer dialogs in the restaurant information domain. Results were presented in special sessions at SIGDIAL 2014 and IEEE SLT 2014. DSTC2 and 3 were organized by Matthew Henderson, Blaise Thomson, and Jason D. Williams.

For DSTC2 and DSTC3 data, and more information, see the DSTC2/3 website.

DSTC4

DSTC4 used human-human dialogs in the tourist information domain. Results were presented at IWSDS 2015. DSTC4 was organized by Seokhwan Kim, Luis F. D’Haro, Rafael E Banchs, Matthew Henderson, and Jason D. Williams.

For more information about DSTC4, see the DSTC4 website.

DSTC5

DSTC5 used human-human dialogs in the tourist information domain, where training dialogs were provided in one language, and test dialogs were in a different language. Results will be presented in a special session at IEEE SLT 2016. DSTC5 was organized by Seokhwan Kim, Luis F. D’Haro, Rafael E Banchs, Matthew Henderson, Jason D. Williams, and Koichiro Yoshino.

For more information about DSTC5, see the DSTC5 website.

Mailing list

To join the mailing list, send an email to:

listserv@lists.research.microsoft.com

put “subscribe DSTC” in the body of the message (without the quotes).

To post a message, email:

dstc@lists.research.microsoft.com

Overview publications

Summary of DSTC1, 2, and 3

The Dialog State Tracking Challenge Series: A Review. Jason D. Williams, Antoine Raux, and Matthew Henderson. Dialogue & Discourse, April 2016.

DSTC1

The Dialog State Tracking Challenge. Jason D. Williams, Antoine Raux, Deepak Ramachadran, and Alan Black. Proceedings of the SIGDIAL 2013 Conference, Metz, France, August 2013.

DSTC2

The Second Dialog State Tracking Challenge. Matthew Henderson, Blaise Thomson, and Jason D. Williams. Proceedings of SIGDIAL 2014 Conference, Philadelphia, USA, June 2014.

DSTC3

The Third Dialog State Tracking Challenge. Matthew Henderson, Blaise Thomson, and Jason D. Williams. Proceedings IEEE Spoken Language Technology Workshop (SLT), South Lake Tahoe, USA, December 2014.

DSTC4

The Fourth Dialog State Tracking Challenge. Seokhwan Kim, Luis F. D’Haro, Rafael E Banchs, Matthew Henderson, and Jason D. Williams. Proceedings IEEE Spoken Language Technology Workshop (SLT), South Lake Tahoe, USA, December 2014.

DSTC5

The Fifth Dialog State Tracking Challenge. Seokhwan Kim, Luis F. D’Haro, Rafael E Banchs, Matthew Henderson, Jason D. Williams, and Koichiro Yoshino. Proceedings IEEE Spoken Language Technology Workshop (SLT), San Diego, USA, December 2016.

DSTC1 information

Background and motivation

In dialog systems, “state tracking” – sometimes also called “belief tracking” – refers to accurately estimating the user’s goal as a dialog progresses. Accurate state tracking is desirable because it provides robustness to errors in speech recognition, and helps reduce ambiguity inherent in language within a temporal process like dialog. Dialog state tracking is an important problem for both traditional uni-modal dialog systems, as well as speech-enabled multi-modal dialog systems on mobile devices, on tablet computers, and in automobiles.

Recently, a host of models have been proposed for dialog state tracking. However, comparisons among models are rare, and different research groups use different data from disparate domains. Moreover, there is currently no common dataset which enables off-line dialog state tracking experiments, so newcomers to the area must first collect dialog data, which is expensive and time-consuming, or resort to simulated dialog data, which can be unreliable. All of these issues hinder advancing the state-of-the-art.

Challenge format

In this challenge, participants will use a provided set of labeled dialogs to develop a dialog state tracking algorithm. Algorithm will then be evaluated on a common set of held-out dialogs, to enable comparisons.

The data for this challenge will be taken from the Spoken Dialog Challenge (Black et al, 2011), which consists of human/machine spoken dialogs with real users (not usability subjects). Before the start of the challenge, a draft of the labeling guide and evaluation measures will be published, and comments will be invited from the community. The organizers will then perform the labeling.

At the start of the challenge – the development phase – participants will be provided with a training set of transcribed and labeled dialogs. Participants will also be given code that implements the evaluation measurements. Participants will then have several months to optimize their algorithms.

At the end of the challenge, participants will be given an untranscribed and unlabeled test set, and a short period to run their algorithm against the test set. Participants will submit their algorithms’ output to the organizers, who will then perform the evaluation. After the challenge, the test set transcriptions and labels will be made public.

Results of the evaluation will be reported to the community. In this challenge, the identities of participants will not be made public in any written results by the organizers or participants, except that participants may identify themselves (only) in their own written results.

Schedule

1 July 2012 Beginning of comment period on labeling and evaluation metrics
4-6 July 2012 YRRSDS and SigDial in Korea: announcement of comment period
17 August 2012 End of comment period on labeling and evaluation metrics
31 August 2012 Evaluation metrics and labeling guide published; labeling begins
10 December 2012 Labeling ends; data available; challenge begins (14 weeks)
3-5 December 2012 IEEE SLT 2012 (Miami): Information session, 5 Dec 3 PM
22 March 2013 Final system due; evaluation begins (1 week)
29 March 2013 Evaluation output due to organizers
5 April 2013 Results sent to teams
3 May 2013 SigDial paper deadline; write up (4 weeks)
August 2013 SigDial 2013

Organizers

Jason D. Williams, Microsoft Research, USA (chair)
Alan Black, Carnegie Mellon University, USA
Deepak Ramachandran, Honda Research Institute, USA
Antoine Raux, Honda Research Institute, USA

Advisory committee

Daniel Boies, Microsoft, Canada
Paul Crook, Microsoft, USA
Maxine Eskenazi, Carnegie Mellon University, USA
Milica Gasic, University of Cambridge, UK
Dilek Hakkani-Tur, Microsoft, USA
Helen Hastie, Heriot Watt University, UK
Kee-Eung Kim, KAIST, Korea
Ian Lane, Carnegie Mellon University, USA
Sungjin Lee, Carnegie Mellon University, USA
Teruhisa Misu, NICT, Japan
Olivier Pietquin, SUPELEC, France
Joelle Pineau, McGill University, Canada
Blaise Thomson, University of Cambridge, UK
David Traum, USC Institute for Creative Technologies, USA
Luke Zettlemoyer, University of Washington, USA

More information

The Dialog State Tracking Challenge. Jason D. Williams, Antoine Raux, Deepak Ramachadran, and Alan Black. Proceedings of the SIGDIAL 2013 Conference, Metz, France, August 2013.

Dialog state tracking challenge handbook. Antoine Raux, Deepak Ramachandran, Alan Black, and Jason D. Williams. Technical Report.

A belief tracking challenge task for spoken dialog systems. Jason D. Williams. Proceedings NAACL HLT 2012 Workshop on Future directions and needs in the Spoken Dialog Community: Tools and Data, June 2012.

Spoken Dialog Challenge 2010: Comparison of Live and Control Test Results. Alan W Black, Susanne Burger, Alistair Conkie, Helen Hastie, Simon Keizer, Oliver Lemon, Nicolas Merigaud, Gabriel Parent, Gabriel Schubiner, Blaise Thomson, Jason D. Williams, Kai Yu, Steve Young, and Maxine Eskenazi. Proceedings SIGDIAL, Portland, Oregon, USA, 2011.

DSTC1 downloads

Training data

(*) Note: train1b and train1c are larger sets of calls with transcriptions but WITHOUT correctness labels.

Test data

Support materials

DSTC1 results