From booking air tickets to analyzing astronomy datasets, database queries are pervasive in people’s work and life. However, reasoning database queries automatically is not easy. It is shown to be undecidable in general. And there are extensive studies from database community that are focus on the theoretical limitations.
In this talk, I am going to present Cosette, the first tool for checking the semantic equivalence of SQL queries. The core of Cosette is a formal semantics of SQL based on semirings. This semantics covers major SQL features, including sophisticated ones such as grouping, aggregate, correlated subqueries, and integrity constraints. Also, this semantics is denotational and only adds a few equational axioms, as the interpretation of SQL, to semirings.
Then, to check the equivalences, Cosette uses this semantics to encode a pair of input SQL queries in both an interactive theorem prover and a constraint solver. In the end, Cosette will either certify their equivalences using a sound decision procedure implemented in a theorem prover that covers the known decidable fragment of SQL, or show their inequivalence by providing a counter-example. Empirical studies show that Cosette can decide the equivalence or provide counter example for a wide range of practical SQL queries collected from database literature, real-world optimizer rules and bugs, and data management class homework assignment from UW.
View presentation slides at https://www.microsoft.com/en-us/research/uploads/prod/2018/12/Automated-Reasoning-of-Database-Queries-SLIDES.pdf