DIY: Assessing the Correctness of Natural Language to SQL Systems

  • Arpit Narechania
  • Adam Fourney
  • Gonzalo Ramos
  • Bongshin Lee

Annual Conference on Intelligent User Interfaces |

Designing natural language interfaces for querying databases remains an important goal pursued by researchers in natural language processing, databases, and HCI. These systems receive natural language as input, translate it into a formal database query, and execute the query to compute a result. Because the responses from these systems are not always correct, it is important to provide people
with mechanisms to assess the correctness of the generated query and computed result. However, this assessment can be challenging for people who lack expertise in query languages. We present
Debug-It-Yourself (DIY), an interactive technique that enables users to assess the responses from a state-of-the-art natural language to SQL (NL2SQL) system for correctness and, if possible, fix errors.
DIY provides users with a sandbox where they can interact with (1) the mappings between the question and the generated query, (2) a small-but-relevant subset of the underlying database, and (3) a multi-modal explanation of the generated query. End-users can then employ a back-of-the-envelope calculation debugging strategy to evaluate the system’s response. Through an exploratory study with 12 users, we investigate how DIY helps users assess the correctness of the system’s answers and detect & fix errors. Our observations reveal the benefits of DIY while providing insights about end-user debugging strategies and underscore opportunities for further improving the user experience.