{"id":1116021,"date":"2025-02-11T14:44:54","date_gmt":"2025-02-11T22:44:54","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&#038;p=1116021"},"modified":"2025-02-11T16:38:35","modified_gmt":"2025-02-12T00:38:35","slug":"qure","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/qure\/","title":{"rendered":"QURE"},"content":{"rendered":"\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<div class=\"wp-block-cover alignwide is-light has-custom-content-position is-position-center-left is-style-default\" style=\"min-height:174px;aspect-ratio:unset;\"><span aria-hidden=\"true\" class=\"wp-block-cover__background has-background-dim-10 has-background-dim\"><\/span><div class=\"wp-block-cover__inner-container is-layout-flow wp-block-cover-is-layout-flow\">\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained wp-container-1 is-position-sticky\">\n<h1 class=\"wp-block-heading is-style-xl\" id=\"qure-ai-assisted-and-automatically-verified-udf-to-sql-translation-1\"><strong>QURE: AI-assisted and Automatically Verified UDF to SQL Translation<\/strong><\/h1>\n<\/div>\n<\/div><\/div>\n<\/div>\n\n\n\n\n\n<p>User-defined functions (UDFs) extend the capabilities of SQL by improving code reusability and encapsulating complex logic, but can hinder the performance due to optimization and execution inefficiencies. Prior approaches attempt to address this by rewriting UDFs into native SQL, which is then inlined into the SQL queries that invoke them. However, these approaches are either limited to simple pattern matching or require the synthesis of complex verification conditions from procedural code, a process that is brittle and difficult to automate. This limits coverage and makes the translation approaches less extensible to unseen procedural constructs. In this work, we present QURE, a framework that (1) leverages large language models (LLMs) to translate UDFs to native SQL, and (2) introduces a novel formal verification method to establish equivalence between the UDF and its translation. QURE uses the semantics of SQL operators to automate the derivation of verification conditions, in turn resulting in broad coverage and high extensibility. We model a large set of imperative constructs, particularly those common in Python and Pandas UDFs, in an intermediate verification language, allowing for the verification of their SQL translation. In our empirical evaluation of Python and Pandas UDFs, equivalence is successfully verified for 88% of UDF-SQL pairs, with LLMs correctly translating 84.8% of these UDFs. The remaining UDFs lack semantically equivalent SQL. Executing the translated UDFs achieves median performance improvements of 23.7x on single-node clusters and 12.5x on 12-node clusters compared to the original UDFs, while also significantly reducing out-of-memory errors.<\/p>\n\n\n\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/dl.acm.org\/doi\/10.1145\/3709716\">SIGMOD 2025 paper<br><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/02\/QURE-Extended-Version.pdf\">Extended Version<\/a><br><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/1drv.ms\/u\/s!AszJNP802sp5bdYKYtZSRFH3phk?e=3PzxtS\">QURE Benchmark.zip<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (Python and Pandas UDFs used for evaluating QURE)<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>User-defined functions (UDFs) extend the capabilities of SQL by improving code reusability and encapsulating complex logic, but can hinder the performance due to optimization and execution inefficiencies. Prior approaches attempt to address this by rewriting UDFs into native SQL, which is then inlined into the SQL queries that invoke them. However, these approaches are either [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13563],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-1116021","msr-project","type-msr-project","status-publish","hentry","msr-research-area-data-platform-analytics","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"","related-publications":[1128738],"related-downloads":[],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[{"type":"user_nicename","display_name":"Arnd Christian K\u00f6nig","user_id":31427,"people_section":"Related people","alias":"chrisko"},{"type":"user_nicename","display_name":"Shuvendu Lahiri","user_id":33640,"people_section":"Related people","alias":"shuvendu"}],"msr_research_lab":[199565],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/1116021","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":33,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/1116021\/revisions"}],"predecessor-version":[{"id":1129116,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/1116021\/revisions\/1129116"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1116021"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1116021"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1116021"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1116021"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=1116021"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}