Abstract

In document information retrieval, the terminology given by a user may not match the terminology of a relevant document. Query expansion seeks to address this mismatch; it can significantly increase effectiveness, but is slow and resource-intensive. We investigate the use of document expansion as an alternative, in which documents are augment\-ed with related terms extracted from the corpus during indexing, and the overheads at query time are small. We propose and explore a range of corpus-based document expansion techniques and compare them to corpus-based query expansion on TREC data. These experiments show that document expansion delivers at best limited benefits, while query expansion – including standard techniques and efficient approaches described in recent work – delivers consistent gains. We conclude that document expansion is unpromising, but it is likely that the efficiency of query expansion can be further improved.