On Evaluating Language Technologies


August 11, 2008


Ellen Voorhees


National Institute of Standards and Technology


The Text REtrieval Conference (TREC) is an on-going series of evaluation workshops that has standardized and validated the use of test collections as a research tool for improving document retrieval. A retrieval test collection is a carefully calibrated abstraction of a retrieval task that Sparck Jones called the ”core competency” of search: a task that is necessary, but not sufficient, for user retrieval tasks. The abstraction facilitates research by controlling for some sources of variability, thus increasing the power of experiments that compare system effectiveness while reducing their cost.

We have extended the text collection paradigm to other language technologies including question answering, summarization, and textual entailment by defining abstracted evaluation tasks for them. This talk will provide a brief history of the NIST evaluations for these tasks, examining both the validity of an evaluation (Are the conclusions of the evaluation true?) and the utility of an evaluation (Are the conclusions helpful?).


Ellen Voorhees

Ellen Voorhees is manager of the Retrieval Group in the Information Access Division of the National Institute of Standards and Technology (NIST). The Retrieval Group is home to the Text REtrieval Conference, TRECVid (evaluation of content-based access to digital video), and the new Text Analysis Conference (TAC, an evaluation conference for the natural language processing community). Prior to joining NIST in 1996, she was a senior research scientist at Siemens Corporate Research in Princeton, NJ. She received her PhD from Cornell University where she studied under Gerard Salton.