Question Answering with Knowledge Base, Web and Beyond

  • Hao Ma ,
  • Scott Wen-tau Yih

Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorial Abstracts |

Published by Association for Computational Linguistics

View Publication

Developing a Question Answering (QA) system to automatically answer natural language questions has been a long-standing research problem since the dawn of AI, for its clear practical and scientific value. For instance, whether a system can answer questions correctly is a natural way to evaluate a machine’s understanding of a domain. Providing succinct and precise answers to informational queries is also the direction pursued by the next generation of search engines that aim to incorporate more “semantics”, as well as the basic function in digital assistants like Siri and Cortana.

In this tutorial, we aim to give the audience a coherent overview of the research of question answering. We will first introduce a variety of QA problems proposed by pioneer researchers and briefly describe the early efforts. By contrasting with the current research trend in this domain, the audience can easily comprehend what technical problems remain challenging and what the main breakthroughs and opportunities are during the past half century. For the rest of the tutorial, we select three categories of the QA problems that have recently attracted a great deal of attention in the research community, and will present the tasks with the latest technical survey.

The first two categories regard answering factoid questions, where the main difference of the problem settings is the information source used for extracting answers. QA with knowledge base aims to answer natural language questions using real-world facts stored in an existing, large-scale database. The representative approach for this task is to develop a semantic parser (of questions), which will be the main focus. Other approaches like text matching in the embedding space and those driven by information extraction will also be discussed. The other category, QA with the Web, targets answering questions using mainly from the facts extracted from general text corpora derived from the Web. In addition to the common components and techniques used in this setting, including passage retrieval, entity recognition and question analysis, we will also introduce latest work on how to leverage and incorporate additional structured and semi-structured data to improve the performance. The third category of the QA problems that we will highlight is the non-factoid questions. Due to its broad coverage, we will briefly cover three exemplary topics: story comprehension, reasoning questions and paragraph QA. The tutorial will conclude by summarizing a whole area of exciting and dynamic research that is worthy of more detailed investigation for many years to come.