Empirical Methods for Evaluating Dialog Systems

Tim Paek

Empirical Methods for Evaluating Dialog Systems

Tim Paek

ACL 2001 Workshop on Evaluation Methodologies for Language and Dialogue Systems | January 2001

Download BibTex

We examine what purpose a dialog metric serves and then propose empirical methods for evaluating systems that meet that purpose. The methods include a protocol for conducting a wizard-of-oz experiment and a basic set of descriptive statistics for substantiating performance claims using the data collected from the experiment as an ideal benchmark or “gold standard” for making comparative judgments. The methods also provide a practical means of optimizing the system through component analysis and cost valuation.