{"id":967218,"date":"2023-11-08T14:36:00","date_gmt":"2023-11-08T22:36:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&#038;p=967218"},"modified":"2023-11-18T10:15:39","modified_gmt":"2023-11-18T18:15:39","slug":"self-service-data-preparation","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/self-service-data-preparation\/","title":{"rendered":"Self-service Data Preparation"},"content":{"rendered":"<section class=\"mb-3 moray-highlight\">\n\t<div class=\"card-img-overlay mx-lg-0\">\n\t\t<div class=\"card-background bg-gray-200 has-background- card-background--full-bleed\">\n\t\t\t\t\t<\/div>\n\t\t<!-- Foreground -->\n\t\t<div class=\"card-foreground d-flex mt-md-n5 my-lg-5 px-g px-lg-0\">\n\t\t\t<!-- Container -->\n\t\t\t<div class=\"container d-flex mt-md-n5 my-lg-5 \">\n\t\t\t\t<!-- Card wrapper -->\n\t\t\t\t<div class=\"w-100 w-lg-col-5\">\n\t\t\t\t\t<!-- Card -->\n\t\t\t\t\t<div class=\"card material-md-card py-5 px-md-5\">\n\t\t\t\t\t\t<div class=\"card-body \">\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n<h1 class=\"wp-block-heading\" id=\"self-service-data-preparation\"><em>Self-service Data Preparation<\/em><\/h1>\n\n\n\n<p><\/p>\n\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/div>\n<\/section>\n\n\n\n\n\n<p>It is often cited that data scientists spend a significant portion of their time (up to 80%), cleaning and preparing data. For less-technical users, who may be less proficient in writing code (e.g., in Excel, Power-BI and Tableau), the tasks of preparing and cleaning data are not just time-consuming, but also technically challenging. <\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>In the &#8220;<em>Self-service Data Preparation<\/em>&#8221; project, our goal is to develop technologies that can automate common data-preparation tasks, in the context of data science and business intelligence workflows. We aim to empower technical and non-technical users alike, towards the democratization of data.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>Our research has been recognized with best paper awards at VLDB and SIGMOD. Some of our technologies have been integrated into Microsoft products such as <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/docs.microsoft.com\/en-us\/power-query\/power-query-what-is-power-query\">Power Query<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> for <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/powerbi.microsoft.com\/en-us\/\">Power BI<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (program synthesis, operator recommendations, fuzzy join, fuzzy deduplication), <a href=\"https:\/\/www.microsoft.com\/en-us\/microsoft-365\/excel\">Excel<\/a> (error detection, data cleansing), <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/docs.microsoft.com\/en-us\/python\/api\/overview\/azure\/dataprep\/intro?view=azure-dataprep-py\">Azure Machine Learning<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (data prep sdk), <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/azure.microsoft.com\/en-us\/services\/purview\/\">Azure Purview<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (auto-tagging in data lake), <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/azure.microsoft.com\/en-us\/products\/data-factory\">Azure Data Factory<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (fuzzy join), and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/dynamics.microsoft.com\/en-us\/ai\/customer-insights\/\">Dynamics 365 Customer Insights<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (fuzzy join, fuzzy deduplication). <\/p>\n\n\n\n<p><\/p>\n\n\n\n<p><\/p>\n\n\n","protected":false},"excerpt":{"rendered":"<p>It is often cited that data scientists spend a significant portion of their time (up to 80%), cleaning and preparing data. For less-technical users, who may be less proficient in writing code (e.g., in Excel, Power-BI and Tableau), the tasks of preparing and cleaning data are not just time-consuming, but also technically challenging. In the [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13556,13563],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-967218","msr-project","type-msr-project","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-data-platform-analytics","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"","related-publications":[698740,957138,950241,950232,940866,847864,762010,739477,732355,329798,654228,610266,578671,575673,496655,481248,480054,372359],"related-downloads":[],"related-videos":[],"related-groups":[957177],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[{"type":"user_nicename","display_name":"Yeye He","user_id":34992,"people_section":"Related people","alias":"yeyehe"},{"type":"user_nicename","display_name":"Vivek Narasayya","user_id":34602,"people_section":"Related people","alias":"viveknar"},{"type":"user_nicename","display_name":"Surajit Chaudhuri","user_id":33764,"people_section":"Related people","alias":"surajitc"}],"msr_research_lab":[],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/967218","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":12,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/967218\/revisions"}],"predecessor-version":[{"id":1108257,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/967218\/revisions\/1108257"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=967218"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=967218"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=967218"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=967218"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=967218"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}