Data Mining and Enterprise Intelligence

Data Mining and Enterprise Intelligence

Introduction: Data Mining and Enterprise Intelligence Group (DMEI) in Microsoft Research Asia (MSRA) was built since August 2015. All researchers and RSDEs in DMEI group came from previous machine learning group of MSRA with major expertise on text mining, knowledge extraction, semantic computing, information retrieval and general machine learning. The mission of DMEI group is to mining knowledge from Web, enterprise and personal data for building intelligent agents using machine learning to improve productivity of enterprise users and their customers. In other words, for Data Mining and Enterprise Intelligence (DMEI) group, Data Mining stands for learning knowledge from text data, Enterprise means we target at improving productivity of enterprise users and Intelligence will be showcased by AI agents with knowledge.

Ongoing Projects


We define the knowledge mining work in 4 steps by simulating human lifelong knowledge learning process.

L1. From new born to a child – learning from daily lives

  • There are two types of common knowledge that a new born can learn from daily lives, which are:
  1. Common sense knowledge such as “China-IsA-Country”
  2. Shallow open domain knowledge such as “Avatar-Directed By-Cameron”.

Since the daily lives of internet users are on Web, we have two research projects, which are known as Conceptualization 1.0 and Kable 1.0 to learn these two types of common knowledge from Web respectively. This type of knowledge is generally open domain in large scale but lack of depth.

L2. From child to a graduate student – learning from school education

  • Only shallow open domain knowledge cannot satisfy the requirements of professional agents. We need deep domain knowledge, which could be learned from school education mainly by text book reading. Thus the second step of our knowledge mining research focuses on document reading capability of learning algorithms. There are two types of knowledge come from document reading, which are:
  1. Memorized knowledge such as frequently asked questions with answers, which no need deep understanding to text semantics but we should index and retrieve with high accuracy.
  2. Semantic knowledge graph such as entities with their attributes and relations and also with numerical representation of facts for knowledge computation purpose, which need us to summarize and understand the text semantics for computing and inference.

The Kable 2.0 and Conceptualization 2.0 are designed to read text documents to extract both types of knowledge with Enterprise Dictionary as a showcase application. Conceptualization 2.0 provides a way to integrate both symbolic knowledge representation and distributed vector representation for knowledge computing.

L3. From graduate to a PhD – learning from people/advisor

  • After graduation, people generally learn knowledge from advisor to improve their profession, which could be considered as a PhD program in real world. In our research, we design learning agents to actively learn from people to acquire personalized knowledge, which cannot be learned through Web or document reading. There are two ways for advisor to teach a student, which are:
  1. Explicit teaching. People will explicitly tell knowledge to agents and answer questions from agents.
  2. Implicit teaching. People will showcase their behavior and agents observe to model the valuable knowledge.

The active knowledge learning using active conversation model and structured knowledge modeling with machine learning techniques are in scope of Kable 3.0 and Conceptualization 3.0 respectively.

L4. From PhD to an expert – learning from work experiences

  • Finally, the world’s knowledge changes rapidly and the real world is too complex to be modeled as knowledge base under different context. People will learn to grow their knowledge through experiences in work. Our research aims to simulate the same ways for machines to learn real world experience style knowledge. We have two research efforts in this step of knowledge learning, which are
  1. Learning from experiences using agent.
  2. Learning to innovate something new (machine innovation).

The long term research topics including reinforcement learning for knowledge enhancement, hypothesis learning, metaphor learning etc. from DMEI group are addressing these challenges.


After machines improved the capability of knowledge learning, we aim to use leaned knowledge for building intelligent agents for enterprise productivity. In our research scope, there are three ways to improve productivity of enterprise users, which are (1) assist the common tasks of users; (2) help handle repeat work on behalf of users; and (3) save communication cost of users using agents. We rank them as of high priority since they are real complains from multiple sources of enterprise users. In detail,

  1. Common task assistant. There are many frequently used common tasks come from enterprise users such as travel planning, meeting scheduling, reimbursement, IT or HR question answering etc. All these works are not expertise of every FTE in a company. An AI agent assistant will be very helpful for a company to save cost from each FTE who need help on these tasks. There is a long list of related works from both academia and industry, who are trying to provide intelligent assistants.
  2. Digital me on behalf of me. Most FTE in a company will have repeat work using their expertise in their daily works such as reply emails for similar questions, repeatedly get feedback from a long list of team members or customers etc. In addition, in many real world scenarios, the user problem may have no certain answer. For example, different doctors may have different opinions on symptoms of the same patient. This makes a common task agent fail in helping users. We use our knowledge mining technology to help each person to build her own agent in an easy way and the agent can actively learn from master. Through this way, the personal agent, which is known as Digital Me, can help each person to handle their work and express their personal opinions for improving productivity of a whole organization.
  3. Agent society for communication. The communication cost is a pain point of many companies and organizations. As an example, in the United States, there are around 11 million meetings every day, which wasted 37 billion dollars every year. Most of the money are wasted on time cost of understanding the meeting with a group of people. After we provide solutions for each person to build her own agent, the agent can help user to have pre-meeting to understand it. Besides, an agent can collect information from the agent society in seconds to save time cost of human being in communications.