Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics

Chin-Yew Lin; E.H. Hovy

Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics

Chin-Yew Lin ,
E.H. Hovy

The 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics | May 2003

Organized by HLT | NAACL

PDF

Download BibTex

Following the recent adoption by the machine translation community of automatic evaluation using the BLEU/NIST scoring process, we conduct an in-depth study of a similar idea for evaluating summaries. The results show that automatic evaluation using unigram cooccurrences between summary pairs correlates surprising well with human evaluations, based on various statistical metrics; while direct application of the BLEU evaluation procedure does not always give good results.