Microsoft Research Blog

Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

Microsoft researchers unlock the black box of network embedding

February 7, 2018 | By Kuansan Wang, Managing Director, MSR Outreach Academic Services

At the ACM Conference on Web Search and Data Mining 2018, my team will introduce research that, for the first time, provides a theoretical explanation of popular methods used to automatically map the structure and characteristics of networks, known as network embedding. We then use this theoretical explanation to present a new network embedding method that performs as well as or better than existing methods.

Networks are fundamental ways of representing knowledge and relating to the world. As humans, we think through association; we naturally draw links between concepts or entities. People are linked through family relationships or through collaborations. Diseases are linked to treatments. Works of art are linked to their creators. Wikipedia represents the interconnectedness of human knowledge.

As humans, we have great capacity to understand knowledge, notice similarities and make connections and inferences. However, our capacity is limited by the amount of information we can process. Computers, on the other hand, can process tremendous amounts of information, but their ability to understand knowledge and make inferences is limited. That’s because humans think in symbols and computers think in math. Recent advances in computer science such as word and network embedding address this problem by using methods that translate discrete symbols into continuous representations such as high-dimensional vectors that computers can process algebraically.

Network embedding methods enable us to map the structure and characteristics of a network algorithmically, without human intervention. Mapping can be used to predict missing links, to identify missing entities and to make inferences that inform, for example, recommender systems on e-commerce websites.

Existing methods of network embedding work empirically, but so far, scientists have considered them a black box: We lacked understanding of how or why these methods work. Our recent research—a collaboration between Microsoft Research and Tsinghua University—contributes to the theoretical understanding of four popular network-embedding methods, including DeepWalk, node2vec, as well as LINE and PTE, which were developed at Microsoft Research Asia.

Our research bridges the gap between the empirical performance and theoretical mechanism of network embedding. We prove that four popular network-embedding models are —in theory— implicitly performing matrix factorizations over a network. More excitingly, the theoretical analysis also provides the closed forms of those factorized matrices, most of which are related to the network’s Laplacian matrix. We show that these models share a common practice: they are unified into a general matrix-factorization framework. As such, this work lays the theoretical foundation for network embedding, yielding a better understanding of latent representation learning over networks.

Theoretical understanding is important because it enables us to go beyond empirical attempts. Once we understand why and how something works, we are better able to predict how it will work in the future and can use it more efficiently. Based on the new theoretical understanding we derived from four existing methods, we proposed a new algorithm, NetMF, that uses matrix factorization explicitly to create network embeddings. The new algorithm performed at comparative or better levels than existing methods in our tests, providing further support for our theoretical explanation.

We use network embedding in the Microsoft Academic Graph – the largest knowledge graph of research publications in existence, accessible via the Academic Knowledge API and the Microsoft Academic website. As machines map the structure and meaning of entities in the graph and their relationships, they can begin making inferences. For example, they can infer that two papers are quite similar in content, or two authors do similar work. In a way, this vector representation is like your genome sequence. If it is similar to someone else’s, you must be related. With network embedding, we can find academic twins that perhaps were not aware of each other. We can help researchers discover scholarship they might not have come across otherwise, and bridge gaps between traditional disciplinary divisions.

The research team included Jiezhong Qiu (a summer intern at Microsoft Research), Yuxiao Dong, Hao Ma, Kuansan Wang from Microsoft Research, and Jian Li and Jie Tang from Tsinghua University.

Figure 1: The closed forms of the matrices that are implicitly approximated and factorized by popular network-embedding models.

 

Up Next

Optimizing imperative functions in relational databases with Froid

Data management, analysis and visualization

Optimizing imperative functions in relational databases with Froid

For decades, databases have supported declarative SQL as well as imperative functions and procedures as ways for users to express data processing tasks. While the evaluation of declarative SQL has received a lot of attention resulting in highly sophisticated techniques, improvements in the efficient evaluation of imperative programs have remained elusive. Imperative User-Defined Functions (UDFs) […]

Karthik Ramachandra

Senior Applied Scientist

Data management, analysis and visualization

Microsoft and Tsinghua University Work Together on Open Academic Data Research

In a recent collaboration, Microsoft and China’s Tsinghua University released an academic graph, named Open Academic Graph (OAG). This billion-scale academic graph integrates the current Microsoft Academic Graph (MAG) and Tsinghua’s AMiner academic graph. Specifically, it contains the metadata information of 155 million academic paper metadata from AMiner and 166 million papers from MAG. By […]

Microsoft blog editor

Data management, analysis and visualization, Search and information retrieval

Getting LinkedIn to Data Science with Dr. Igor Perisic

Episode 11, February 7, 2018 - Big data is a big deal, and if you follow the popular technical press, you’ll have heard all the metaphors: data is the new oil, the new bacon, the new currency, the new electricity. It’s even been called the new black. While data may not actually be any of these things, we can say this: in today’s networked world, data is increasingly valuable, and it is essential to research, both basic and applied.

Microsoft blog editor