Empirical Methods for Evaluating Dialog Systems
- Tim Paek
ACL 2001 Workshop on Evaluation Methodologies for Language and Dialogue Systems |
We examine what purpose a dialog metric serves and then propose empirical methods for evaluating systems that meet that purpose. The methods include a protocol for conducting a wizard-of-oz experiment and a basic set of descriptive statistics for substantiating performance claims using the data collected from the experiment as an ideal benchmark or “gold standard” for making comparative judgments. The methods also provide a practical means of optimizing the system through component analysis and cost valuation.