{"id":166504,"date":"2014-04-23T00:00:00","date_gmt":"2014-04-23T00:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/msr-research-item\/flashrelate-extracting-relational-data-from-semi-structured-spreadsheets-using-examples\/"},"modified":"2018-10-16T20:19:35","modified_gmt":"2018-10-17T03:19:35","slug":"flashrelate-extracting-relational-data-from-semi-structured-spreadsheets-using-examples","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/flashrelate-extracting-relational-data-from-semi-structured-spreadsheets-using-examples\/","title":{"rendered":"FlashRelate: Extracting Relational Data from Semi-Structured Spreadsheets Using Examples"},"content":{"rendered":"<p>Spreadsheets store a tremendous amount of important data. One reason spreadsheets are so successful is that they are both easy to use and allow users great expressiveness in storing and manipulating their data. This flexibility comes at a price, as presentation elements are often combined with the underlying data model. As a result, many spreadsheets contain data in ad-hoc formats. These formats complicate the use of traditional relational tools which require data in a normalized form. Normalizing data from these formats is often tedious or requires programming, and often, a user may prefer the original presentation.<\/p>\n<p>We describe an approach that allows users to easily extract structured data from spreadsheets without programming. We make two contributions. First, we describe a novel domain specific language called Flare that extends traditional regular expressions with spatial constraints. Second, we describe an algorithm called FlashRelate that can synthesize Flare programs from user-provided positive and negative examples. Using 43 benchmarks drawn both from a standard spreadsheet corpus and from Excel-user help forums, we demonstrate that correct extraction programs can be synthesized quickly from a small number of examples. Our approach generalizes to many data-cleaning tasks on semi-structured spreadsheets.<\/p>\n<p>View the FlashRelate <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/tinyurl.com\/mh3bo3a\">Video Demo<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Spreadsheets store a tremendous amount of important data. One reason spreadsheets are so successful is that they are both easy to use and allow users great expressiveness in storing and manipulating their data. This flexibility comes at a price, as presentation elements are often combined with the underlying data model. As a result, many spreadsheets [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"Microsoft Research Technical Report","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"PLDI 2015","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"MSR-TR-2014-53","msr_organization":"","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"PLDI '15 Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"Daniel W. Barowy, Benjamin Zorn","msr_other_contributors":"","msr_speaker":"","msr_award":"Distinguished Artifact Award","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2014-04-23","msr_highlight_text":"","msr_notes":"Distinguished Artifact Award","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":2014,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13563],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-166504","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-data-platform-analytics","msr-locale-en_us"],"msr_publishername":"Microsoft Research Technical Report","msr_edition":"PLDI 2015","msr_affiliation":"","msr_published_date":"2014-04-23","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"MSR-TR-2014-53","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"Distinguished Artifact Award","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"204967","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","title":"flashrelate-tech-report-April2014.pdf","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/flashrelate-tech-report-April2014.pdf","id":204967,"label_id":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[{"id":204967,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/02\/flashrelate-tech-report-April2014.pdf"}],"msr-author-ordering":[{"type":"text","value":"Daniel W. Barowy","user_id":0,"rest_url":false},{"type":"user_nicename","value":"sumitg","user_id":33755,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=sumitg"},{"type":"user_nicename","value":"tedhar","user_id":33971,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=tedhar"},{"type":"user_nicename","value":"zorn","user_id":35154,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=zorn"}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[663303],"msr_project":[654579,361034],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":654579,"post_title":"PROSE Framework","post_name":"prose-framework","post_type":"msr-project","post_date":"2020-07-22 11:23:08","post_modified":"2022-07-22 10:31:53","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/prose-framework\/","post_excerpt":"Microsoft PROSE SDK is a framework of technologies for programming by examples: automatic generation of programs from input-output examples.","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/654579"}]}},{"ID":361034,"post_title":"Program Synthesis","post_name":"program-synthesis","post_type":"msr-project","post_date":"2017-02-02 13:37:27","post_modified":"2020-01-24 08:27:25","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/program-synthesis\/","post_excerpt":"Introduction Program synthesis is the task of automatically discovering an executable piece of code given user intent expressed using various forms of constraints such as input-output examples, demonstrations, natural language, etc. Program synthesis has direct applications for various classes of users in the technology pyramid: (100s of millions of) End Users (people who have access to a computational device but are not expert programmers): Helping them to create small snippets of code for performing repetitive&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/361034"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/166504","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":2,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/166504\/revisions"}],"predecessor-version":[{"id":526670,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/166504\/revisions\/526670"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=166504"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=166504"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=166504"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=166504"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=166504"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=166504"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=166504"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=166504"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=166504"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=166504"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=166504"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=166504"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=166504"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}