{"id":732823,"date":"2021-03-12T00:49:57","date_gmt":"2021-03-12T08:49:57","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&#038;p=732823"},"modified":"2021-05-28T23:37:18","modified_gmt":"2021-05-29T06:37:18","slug":"metainsight-automatic-discovery-of-structured-knowledge-for-exploratory-data-analysis","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/metainsight-automatic-discovery-of-structured-knowledge-for-exploratory-data-analysis\/","title":{"rendered":"MetaInsight: Automatic Discovery of Structured Knowledge for Exploratory Data Analysis"},"content":{"rendered":"<p>Automatic Exploratory Data Analysis (EDA) focuses on automatically discovering pieces of knowledge in the form of interesting data patterns. However, the conveyed knowledge by these suggested data patterns are disjointed or lack organization. Therefore, it is difficult for users to gain structured knowledge, and as the number of suggested patterns grows, these stand-alone patterns are less likely to motive users to conduct follow-up analysis, which hinders it from being effectively utilized to facilitate EDA. In this paper, we propose MetaInsight, a structured representation of knowledge extracted from multi-dimensional data aiming to facilitate EDA automatically and effectively. Specifically, we propose a novel formulation of basic data pattern to capture essential characteristics of raw data distribution to achieve knowledge extraction. Then based on the mined Homogeneous Data Patterns (HDP) and inter-pattern similarity, MetaInsight is identified by categorizing basic data patterns (within an HDP) into commonness(es) and exceptions thus achieving structured knowledge representation. The commonness(es) and exceptions concretize the knowledge obtained by induction and validation processes which are two typical analysis mechanisms conducted in EDA. We propose a novel scoring function to quantify the usefulness of MetaInsight, an effective and efficient mining procedure and a ranking algorithm to automatically discover high-quality MetaInsights from multi-dimensional data. We demonstrate the effectiveness and efficiency of MetaInsights (w.r.t. facilitating EDA) through evaluation on real-world datasets and user studies on both expert users and non-expert users.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Automatic Exploratory Data Analysis (EDA) focuses on automatically discovering pieces of knowledge in the form of interesting data patterns. However, the conveyed knowledge by these suggested data patterns are disjointed or lack organization. Therefore, it is difficult for users to gain structured knowledge, and as the number of suggested patterns grows, these stand-alone patterns are [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"Association for Computing Machinery","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"ACM SIGMOD International Conference on Management of Data","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2021-6-20","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"https:\/\/2021.sigmod.org\/calls_papers_sigmod_research.shtml","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13563],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[251584,251194],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-732823","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-data-platform-analytics","msr-locale-en_us","msr-field-of-study-exploratory-data-analysis","msr-field-of-study-knowledge-extraction"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2021-6-20","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"Association for Computing Machinery","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/03\/rdm337-maA.pdf","id":"739741","title":"rdm337-maa","label_id":"243109","label":0},{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/03\/metainsight-extended.pdf","id":"749689","title":"metainsight-extended","label_id":"243103","label":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[{"id":749689,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/05\/metainsight-extended.pdf"},{"id":739741,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/04\/rdm337-maA.pdf"}],"msr-author-ordering":[{"type":"text","value":"Pingchuan Ma","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Justin Ding","user_id":32435,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Justin Ding"},{"type":"user_nicename","value":"Shi Han","user_id":33618,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Shi Han"},{"type":"user_nicename","value":"Dongmei Zhang","user_id":31665,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Dongmei Zhang"}],"msr_impact_theme":[],"msr_research_lab":[199560],"msr_event":[],"msr_group":[714577],"msr_project":[558663,338930],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":558663,"post_title":"Spreadsheet Intelligence","post_name":"spreadsheet-intelligence","post_type":"msr-project","post_date":"2019-01-06 17:18:03","post_modified":"2022-04-24 01:24:49","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/spreadsheet-intelligence\/","post_excerpt":"At Microsoft Research Asia, this is the umbrella research project behind Ideas in Excel of Microsoft Office 365 product.\u00a0With successful technology transfers via close collaboration with Excel teams,\u00a0this intelligent\u00a0feature has been announced at Microsoft Ignite 2019 Conference and released with General Availability on March 1, 2019. There are following sub- or related research projects on some fundamental technology pillars, respectively. They jointly enable such one-click intelligence of Ideas in Excel. TableSense: table range detection\u00a0and table&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/558663"}]}},{"ID":338930,"post_title":"QuickInsights","post_name":"quickinsights","post_type":"msr-project","post_date":"2016-12-19 17:35:38","post_modified":"2019-02-10 16:52:52","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/quickinsights\/","post_excerpt":"welcome! Insight Types Specification supplementary materials of QuickInsights (full) Insight Types Specification: contains the details about insight type definition as well as the significance calculation. supplementary materials of QuickInsights: includes the proof of impact FD induced trivial insights time complexity of FD checker step-by-step to run QuickInsights in Power BI &nbsp; Datasets: CarSales Census Emission Sample Questionnaire: Questionnaire - Carsales Questionnaire - Movie if you have further questions or requests, please contact juding@microsoft.com &nbsp;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/338930"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/732823","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":2,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/732823\/revisions"}],"predecessor-version":[{"id":749698,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/732823\/revisions\/749698"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=732823"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=732823"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=732823"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=732823"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=732823"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=732823"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=732823"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=732823"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=732823"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=732823"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=732823"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=732823"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=732823"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}