Automated Reasoning of Database Queries

Date

November 13, 2018

Speaker

Shumo Chu

Affiliation

University of Washington

Overview

From booking air tickets to analyzing astronomy datasets, database queries are pervasive in people’s work and life. However, reasoning database queries automatically is not easy. It is shown to be undecidable in general. And there are extensive studies from database community that are focus on the theoretical limitations.

In this talk, I am going to present Cosette, the first tool for checking the semantic equivalence of SQL queries. The core of Cosette is a formal semantics of SQL based on semirings. This semantics covers major SQL features, including sophisticated ones such as grouping, aggregate, correlated subqueries, and integrity constraints. Also, this semantics is denotational and only adds a few equational axioms, as the interpretation of SQL, to semirings.

Then, to check the equivalences, Cosette uses this semantics to encode a pair of input SQL queries in both an interactive theorem prover and a constraint solver. In the end, Cosette will either certify their equivalences using a sound decision procedure implemented in a theorem prover that covers the known decidable fragment of SQL, or show their inequivalence by providing a counter-example. Empirical studies show that Cosette can decide the equivalence or provide counter example for a wide range of practical SQL queries collected from database literature, real-world optimizer rules and bugs, and data management class homework assignment from UW.

View presentation slides at https://www.microsoft.com/en-us/research/uploads/prod/2018/12/Automated-Reasoning-of-Database-Queries-SLIDES.pdf

Speakers

Shumo Chu

Shumo Chu is a doctorate student at the University of Washington. His research interests are in data management and programming languages. Shumo’s dissertation focus on developing algorithms and systems for automated reasoning of database queries. This requires leveraging both interactive theorem prover and constraint solver techniques. In the past, he had worked on large scale graph processing and query processing for distributed database systems. He also did internships with DMX group at Microsoft Research and Google’s Spanner team.

People