Close-up on black magnifying glass above The Wall Street Journal newspaper, blue toned.

NewsQA Dataset

With massive volumes of written text being produced every second, how do we make sure that we have the most recent and relevant information available to us? Microsoft research Montreal is tackling this problem by building AI systems that can read and comprehend large volumes of complex text in real-time.

The purpose of the NewsQA dataset is to help the research community build algorithms that are capable of answering questions requiring human-level comprehension and reasoning skills.

Leveraging CNN articles from the DeepMind Q&A Dataset, we prepared a crowd-sourced machine reading comprehension dataset of 120K Q&A pairs.

  • Documents are CNN news articles.
  • Questions are written by human users in natural language.
  • Answers may be multiword passages of the source text.
  • Questions may be unanswerable.
  • NewsQA is collected using a 3-stage, siloed process.
  • Questioners see only an article’s headline and highlights.
  • Answerers see the question and the full article, then select an answer passage.
  • Validators see the article, the question, and a set of answers that they rank.
  • NewsQA is more natural and more challenging than previous datasets.

Challenges

A significant proportion of questions in NewsQA cannot be solved without reasoning. The reasoning types we have identified in our analysis are as follows:

  • Synthesis: Some answers can only be inferred by synthesizing information distributed across multiple sentences.
  • Paraphrasing: A single sentence in the article might entail or paraphrase the question. Paraphrase recognition may require synonymy and word knowledge.
  • Inference: Some answers must be inferred from incomplete information in the article or by recognizing conceptual overlap. This typically draws on general knowledge.
  • Additionally, some questions have no answer or no unique answer in the corresponding story, so a system must learn to recognize when given information is not sufficient.

See other datasets from Microsoft Montreal:
Frames | FigureQA