{"id":980586,"date":"2023-10-30T10:21:17","date_gmt":"2023-10-30T17:21:17","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&#038;p=980586"},"modified":"2024-01-16T06:40:01","modified_gmt":"2024-01-16T14:40:01","slug":"welding-natural-language-queries-to-analytics-irs-with-llms","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/welding-natural-language-queries-to-analytics-irs-with-llms\/","title":{"rendered":"Welding Natural Language Queries to Analytics IRs with LLMs"},"content":{"rendered":"<p style=\"text-align: left;\">From the recent momentum behind translating natural language to SQL (nl2sql), to commercial product offerings such as Co-Pilot for Microsoft Fabric, Large Language Models (LLMs) are poised to have a big impact on data analytics. In this paper, we show that LLMs can be used to convert natural language analytics queries directly to custom intermediate query representations (IRs) of modern data analytics systems. This has the direct benefit of making IRs more accessible to end-users, but interestingly, it can also result in improved translation accuracy and better end-to-end performance, especially when the query semantics is better captured in the IR rather than in SQL. We build an LLM-based pipeline (nl2weld) for one instance of this flow, to translate natural language queries to the Weld IR using gpt-4. nl2weld is carefully designed to harness self-reflection and instruction-following capabilities of gpt-4, providing it various forms of feedback such as domain specific instructions and feedback from the Weld compiler. We evaluate NL2WELD on a subset of the Spider benchmark and compare it against the gold standard SQL and DIN-SQL, a state-of-the-art nl2sql system. We report a comparable accuracy of 77.4% on the dataset, and also demonstrate examples on which nl2weld produces code that is 1.5 \u2212 4\u00d7 faster than the gold standard and DIN-SQL.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>From the recent momentum behind translating natural language to SQL (nl2sql), to commercial product offerings such as Co-Pilot for Microsoft Fabric, Large Language Models (LLMs) are poised to have a big impact on data analytics. In this paper, we show that LLMs can be used to convert natural language analytics queries directly to custom intermediate [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"(To appear in) International Conference on Innovative Data Systems","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2024-1-14","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"https:\/\/www.cidrdb.org\/cidr2024\/","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13563],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-980586","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-data-platform-analytics","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2024-1-14","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/www.microsoft.com\/en-us\/research\/theme\/systems\/","label_id":"252679","label":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[],"msr-author-ordering":[{"type":"user_nicename","value":"Kaushik Rajan","user_id":32574,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Kaushik Rajan"},{"type":"user_nicename","value":"Aseem Rastogi","user_id":36021,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Aseem Rastogi"},{"type":"user_nicename","value":"Akash Lal","user_id":30905,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Akash Lal"},{"type":"user_nicename","value":"Sampath Rajendra","user_id":43107,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Sampath Rajendra"},{"type":"text","value":"Krithika Subramanian","user_id":0,"rest_url":false},{"type":"text","value":"Krut Patel","user_id":0,"rest_url":false}],"msr_impact_theme":[],"msr_research_lab":[199562,199565],"msr_event":[],"msr_group":[144939],"msr_project":[967329,967350],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":967329,"post_title":"Domain Specialization","post_name":"domain-specialization","post_type":"msr-project","post_date":"2023-10-16 02:14:29","post_modified":"2024-01-12 08:47:20","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/domain-specialization\/","post_excerpt":"Scaling performance beyond Moore's law Domain&nbsp;specialization&nbsp;is&nbsp;expected&nbsp;to&nbsp;play&nbsp;a&nbsp;big&nbsp;role&nbsp;in&nbsp;how&nbsp;computer&nbsp;systems&nbsp;evolve&nbsp;in&nbsp;future. With&nbsp;the&nbsp;end&nbsp;of&nbsp;Moore's&nbsp;law,&nbsp;we&nbsp;are&nbsp;already&nbsp;seeing&nbsp;CPU,&nbsp;GPU&nbsp;and domain specific&nbsp;hardware&nbsp;evolving&nbsp;rapidly. The next decade&nbsp;is&nbsp;therefore&nbsp;expected&nbsp;to&nbsp;see&nbsp;big&nbsp;changes&nbsp;in&nbsp;how&nbsp;we&nbsp;develop,&nbsp;compile&nbsp;and&nbsp;run&nbsp;software. This project focuses on data systems, a class of systems where, as the data sizes grow, performance scaling is going to be of importance.First,&nbsp;we&nbsp;believe&nbsp;that domain-specific&nbsp;compilers&nbsp;will&nbsp;play&nbsp;a&nbsp;crucial&nbsp;strategic&nbsp;role&nbsp;in&nbsp;helping&nbsp;software&nbsp;leverage&nbsp;the&nbsp;changing&nbsp;hardware&nbsp;landscape. Such compilers will be multi-layered and will progressively lower computation through multiple intermediate abstractions, performing domain specific optimizations at the higher layers and specializing code to the hardware in lower layers. We&nbsp;have&nbsp;been&nbsp;working&nbsp;on&nbsp;two&nbsp;such&nbsp;domain&nbsp;specific&nbsp;compilers&nbsp;in&nbsp;the&nbsp;data&nbsp;domain. Second, new hardware specific algorithms need&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/967329"}]}},{"ID":967350,"post_title":"AI-Driven Software Engineering","post_name":"967350","post_type":"msr-project","post_date":"2023-09-12 02:52:36","post_modified":"2023-10-09 10:27:33","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/967350\/","post_excerpt":"Using AI to assist every developer build better software faster Generative AI is transforming the way software is built. We conduct research at the forefront of this transformation. We design ML models, algorithms and platforms to improve developer productivity and software reliability. Check out the publications tab to learn more.","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/967350"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/980586","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":2,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/980586\/revisions"}],"predecessor-version":[{"id":999141,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/980586\/revisions\/999141"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=980586"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=980586"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=980586"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=980586"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=980586"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=980586"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=980586"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=980586"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=980586"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=980586"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=980586"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=980586"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=980586"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}