Microsoft Research Blog

Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

Microsoft researchers unlock the black box of network embedding

February 7, 2018 | By Kuansan Wang, Managing Director, MSR Outreach Academic Services

At the ACM Conference on Web Search and Data Mining 2018, my team will introduce research that, for the first time, provides a theoretical explanation of popular methods used to automatically map the structure and characteristics of networks, known as network embedding. We then use this theoretical explanation to present a new network embedding method that performs as well as or better than existing methods.

Networks are fundamental ways of representing knowledge and relating to the world. As humans, we think through association; we naturally draw links between concepts or entities. People are linked through family relationships or through collaborations. Diseases are linked to treatments. Works of art are linked to their creators. Wikipedia represents the interconnectedness of human knowledge.

As humans, we have great capacity to understand knowledge, notice similarities and make connections and inferences. However, our capacity is limited by the amount of information we can process. Computers, on the other hand, can process tremendous amounts of information, but their ability to understand knowledge and make inferences is limited. That’s because humans think in symbols and computers think in math. Recent advances in computer science such as word and network embedding address this problem by using methods that translate discrete symbols into continuous representations such as high-dimensional vectors that computers can process algebraically.

Network embedding methods enable us to map the structure and characteristics of a network algorithmically, without human intervention. Mapping can be used to predict missing links, to identify missing entities and to make inferences that inform, for example, recommender systems on e-commerce websites.

Existing methods of network embedding work empirically, but so far, scientists have considered them a black box: We lacked understanding of how or why these methods work. Our recent research—a collaboration between Microsoft Research and Tsinghua University—contributes to the theoretical understanding of four popular network-embedding methods, including DeepWalk, node2vec, as well as LINE and PTE, which were developed at Microsoft Research Asia.

Our research bridges the gap between the empirical performance and theoretical mechanism of network embedding. We prove that four popular network-embedding models are —in theory— implicitly performing matrix factorizations over a network. More excitingly, the theoretical analysis also provides the closed forms of those factorized matrices, most of which are related to the network’s Laplacian matrix. We show that these models share a common practice: they are unified into a general matrix-factorization framework. As such, this work lays the theoretical foundation for network embedding, yielding a better understanding of latent representation learning over networks.

Theoretical understanding is important because it enables us to go beyond empirical attempts. Once we understand why and how something works, we are better able to predict how it will work in the future and can use it more efficiently. Based on the new theoretical understanding we derived from four existing methods, we proposed a new algorithm, NetMF, that uses matrix factorization explicitly to create network embeddings. The new algorithm performed at comparative or better levels than existing methods in our tests, providing further support for our theoretical explanation.

We use network embedding in the Microsoft Academic Graph – the largest knowledge graph of research publications in existence, accessible via the Academic Knowledge API and the Microsoft Academic website. As machines map the structure and meaning of entities in the graph and their relationships, they can begin making inferences. For example, they can infer that two papers are quite similar in content, or two authors do similar work. In a way, this vector representation is like your genome sequence. If it is similar to someone else’s, you must be related. With network embedding, we can find academic twins that perhaps were not aware of each other. We can help researchers discover scholarship they might not have come across otherwise, and bridge gaps between traditional disciplinary divisions.

The research team included Jiezhong Qiu (a summer intern at Microsoft Research), Yuxiao Dong, Hao Ma, Kuansan Wang from Microsoft Research, and Jian Li and Jie Tang from Tsinghua University.

Figure 1: The closed forms of the matrices that are implicitly approximated and factorized by popular network-embedding models.


Up Next

Data platforms and analytics

Microsoft Investigator fellows accelerate scientific and teaching impact with Azure cloud computing

I am pleased to announce the winners of the new Microsoft Investigator Fellowship. This fellowship is designed to empower researchers of all disciplines who plan to make an impact with research and teaching using the Microsoft Azure cloud computing platform. Each fellowship provides $100,000 USD annually for two years and various training and community events. […]

Jamie Harper

Vice-President, US Education

Data platforms and analytics, Systems and networking

Announcing Microsoft Research Open Data – Datasets by Microsoft Research now available in the cloud

The Microsoft Research Outreach team has worked extensively with the external research community to enable adoption of cloud-based research infrastructure over the past few years. Through this process, we experienced the ubiquity of Jim Gray’s fourth paradigm of discovery based on data-intensive science – that is, almost all research projects have a data component to […]

Vani Mandava

Director, Data Science Outreach

Data platforms and analytics

Microsoft and Tsinghua University Work Together on Open Academic Data Research

In a recent collaboration, Microsoft and China’s Tsinghua University released an academic graph, named Open Academic Graph (OAG). This billion-scale academic graph integrates the current Microsoft Academic Graph (MAG) and Tsinghua’s AMiner academic graph. Specifically, it contains the metadata information of 155 million academic paper metadata from AMiner and 166 million papers from MAG. By […]

Microsoft blog editor