Information retrieval systems can attempt to answer the user’s query directly, by extracting an appropriate passage of text from a corpus and presenting it on the results page. However, sometimes the passage of text contains extraneous information, or multiple passages are needed to form an answer. In cases like these, some sort of answer distillation system could be useful, taking as input the query and the answer-containing passage, and producing a succinct answer for presentation to the user. We formulate the problem of answer distillation as a sub-problem of machine comprehension and natural language generation, drawing techniques from neural machine learning, information retrieval, and natural language processing. To do well in answer distillation, we could benefit from a dataset consisting of many examples of query-passage pairs with their corresponding “ground-truth” or distilled answers. We also need to have a metric to measure the quality of the distilled answers.

In this paper we share our early ideas on building such a dataset and solicit feedback from the community. Our goal is to align our needs for an answer distillation dataset and the needs of future academic research in this space. In particular, we propose that having a large number of reference answers available per query would be beneficial, and consequently suggest extensions to metrics like BLEU and METEOR for the scenario where this is true.