Data, Knowledge, and Intelligence

The goal of our research on data and knowledge is to democratize data intelligence to empower people and organizations to derive insights, learn and share knowledge, and build intelligence to turn data into action. Regardless of the various forms of data, understanding, generation, and interaction are the three common themes threading through the research topics on data and knowledge in different domains. Data understanding aims to achieve semantic understanding of various types of data. Data generation is targeted at automatic content generation based on users’ needs. Interaction with data aims to create unparalleled user experiences working with data.

Data, Knowledge, and Intelligence (DKI) is an interdisciplinary research area with active research in Artificial Intelligence (AI) and Machine Learning, Data Mining and Data Analytics, Knowledge Computing, NLP, information visualization, and Software Analytics. The research in DKI is also grounded as we get data and inspiration from real problems in different domains as well as apply our research results to make real-world impact.

Data Analytics Research We focus on research about understanding data, modeling analysis process, and their combined techniques for automatically generating analytical artifacts such as insights and analysis reports out of data. Specifically, for data we currently focus on representation learning and schematization of human-crafted data artifacts, especially for (semi-)structured data such as table, questionnaire, etc.; for analytics we currently focus on insights mining, forecasting, and causal inference. Underlying technical pillars span machine learning, multi-dimensional data mining, explainable AI, graph models, etc., while application scenarios can be exemplified by projects like Spreadsheet Intelligence. Our key technologies have been/are being shipped with Office (Excel, Forms, Word), Power BI & Dynamics, and Bing Search.

Visualization and HCI We are interested in a variety of research topics in the fields of information visualization, visual analytics, and human-computer interaction. By applying machine learning and AI techniques, we focus on novel technologies, user interactions, and systems that aim to lower the barriers for the users to utilize visualizations effectively to enhance their data analysis and communication abilities. Specifically, we focus on research about visualization/infographics design and authoring, data-driven storytelling, visual analytics systems, novel user interface for data analysis and exploration, data wrangling, etc.

Natural Language Understanding for Data Science Our mission is to advance natural language understanding technologies for intelligent data science. First, a large portion of real-world data is unstructured natural language data. Therefore, natural language understanding technologies are required to analyze and understand such natural language data. Second, natural language is the most intuitive and natural interface of a data analysis tool that allows common users to explore and analyze data through natural language instructions. Specifically, we focus on research topics about semantic parsing, semantic role understanding, entity recognition, dialog, etc. At the same time, we also put research efforts to some fundamental machine learning problems, such as compositional generalization capabilities of DNN models, sample efficiency of reinforcement learning, to facilitate our research on natural language understanding. Our natural language understanding technologies have been shipped as an Excel feature (Excel Ideas (opens in new tab)) that allows common users to analyze Excel tables using natural language. Collaborating with brother teams, we also have shipped our technologies to PowerBI Q&A, Microsoft Bot Framework (opens in new tab), Azure Text Analytics, etc.

Knowledge Computing The aim of the Knowledge Computing Group is to build machines that can make good use of knowledge to empower every person on the planet to achieve more. Natural language processing, information extraction, table interpretation, and knowledge representation & reasoning are four main focus areas. Natural language processing (NLP) analyzes, understands, and generates languages for effective and efficient human-machine communication. Information extraction (IE) recognizes entity mentions, mention types, named entities, and entity-entity relations to create structured data from natural language texts. Table interpretation (TI) detects column types, cell entities, and column-column relations to facilitate question answering over tables. Knowledge representation & reasoning (KRR) provides the foundation for NLP, IE and TI to represent knowledge symbolically and enable automated reasoning and computation over the representation. Over the years, we have worked closely with our product team partners at Office 365 (opens in new tab), Azure Cognitive Services (opens in new tab), and Bing (opens in new tab) to bring our research results into Microsoft products and services. Microsoft Forms Design Intelligence (opens in new tab), PowerPoint Designer (opens in new tab), Excel Data Types AutoDetect, Azure Text Analytics (opens in new tab), Microsoft Video Indexer (opens in new tab), and Microsoft Recognizers Text (open source) (opens in new tab) are just a few recent examples which have incorporated technologies developed by the Knowledge Computing group.

Software Analytics A huge wealth of various data exists in software lifecycle, including source code, feature specifications, bug reports, test cases, execution traces/logs, and real-world user feedback, etc. Data plays an essential role in modern software development, because hidden in the data is information about the quality of software and services as well as the dynamics of software development. With various analytical and computing technologies, such as pattern recognition, machine learning, data mining, and large-scale data computing & processing, software analytics is to enable software practitioners to perform effective and efficient data exploration and analysis in order to obtain insightful and actionable information for data-driven tasks in engineering software and services. The mission of the Software Analytics Group at MSR Asia is to advance the state of the art in the software analytics area; and utilize our technologies to help improve the quality of software and services as well as the development productivity for both Microsoft and software industry. In the past 10 to 15 years, Cloud computing has become the most significant paradigm shift in the IT industry. In this context, in recent years, we have also enhanced our research on how to innovate AI and ML to solve the problem of Cloud platform, specifically, to use AI/ML technologies to help effectively and efficiently build and operate highly complex cloud services at scale, which is called Cloud Intelligence. It contains three key pillars: AI for System/Infrastructure, AI for Customer, and AI for DevOps. Our key technologies have been transferred to multiple Microsoft Cloud services like Azure, Office365, Bing, etc., and significant improvements have been made.