Portrait of Jun Yan

Jun Yan

Senior Research Manager


Dr. Jun Yan received the Ph.D. degree in digital signal processing and pattern recognition from the department of information science, school of mathematical science, Peking University, P.R. China. During his Ph.D., he has been a research intern of MSRA from 2003 to 2005 and awarded as Microsoft fellow in 2004. Before join Microsoft, he has been a research associate at CBI, HMS, Harvard, Cambridge, MA, in 2005. He joined Microsoft Research Asia (MSRA) from 2006. Currently he is working in the Data Mining and Enterprise Intelligence group of MSRA as a senior research manager.

His research interests are on knowledge mining for AI, text data preprocessing,  information retrieval and behavior targeted online advertising etc. So far, he has successfully incubated tens of technologies, which have been used in Microsoft products. In academia, he has more than 60 quality papers published in referred conferences and journals, including SIGKDD, SIGIR, WWW, ICDM, TKDE, etc. He has been the PC members of international conferences SIGKDD, SIGIR etc. and is also reviewers of journals articles TKDE, TPAMI etc.




Research Interests

  • Large scale Web knowledge extraction and mining
  • Behavioral targeted online advertising
  • Large scale data preprocessing
  • Machine learning for information retrieval
  • User modeling and understanding

Selected Projects

  • Knowledge Table. Kable, which is known as Knowledge Table project, aims to automatically extract structured domain knowledge from the semi-structured and unstructured World Wide Web. And then process and store the knowledge in Table format with each row stands for a domain entity and each column stands for an attribute. The cells in Kable are the attribute values of corresponding entity-attribute pairs. Construct this kind of structured knowledge base is important for different OSD applications such as BING search, paid search, display ads etc. Kable research concept map has three layers, which are Data Layer, Model Layer and Application Layer.
  • Intent based behavioral targeting project. Description: this project aims to sale the “intents” to advertisers in behavioral targeted advertising. We classify the user search behaviors into different user intent categories, based on which we can accurately deliver ads to audience. In this project, I mainly focus on the algorithm design and driving the cross group research efforts within MSRA.
  • Online ad relevance verification project. Description: this project aims to improve the ad relevance in Bing paid search. We propose novel features and classifier to improve the ad relevance in a machine learning view. In this project, I mainly focus on the algorithm design, feature proposal and lead the research efforts within MSRA.
  • Bing search task classification project. Description: this project aims to understand whether the Bing search users have the intent to compare sports domain Web objects. Bing will return the side by side comparison without requiring users to browse the 10-blue links. We propose classification solution to make it give satisfied performance to online users. In this project, I work together with product team to design and transfer the intent classifier.
  • Self-service BT prototyping. Description: this project aims to let the advertiser can customize their user segments for ads delivery. We propose the Minhash based user clustering solution and implement the prototype. In this project, I mainly focus on the scenario design, algorithm design and leading the team wide research efforts within MSRA.
  • Office online assets recommendation project. Description: This project aims to recommend the possibly user interested assets to “office online” users based on similar users’ behaviors. We develop the algorithm to make this online recommendation and transfer the technology to Office Online AP team. In this project, I mainly focus on the algorithm design and driving the research efforts within MSRA.


  • Indexing Semantic User Profiles for Targeted Advertising
  • Web Knowledge Extraction for Search Task Simplification
  • Build of Website Knowledge Tables
  • Forecasting Search Queries based on Time Dependencies(Appl. No. 11/770,462)
  • Clustering Aggregator for RSS feeds (Appl. No. 20090327320)
  • Prediction of Future Popularity of Query Terms (Appl. No. 20090222321)
  • Categorizing Online User Behavior Data (MS#327757.01)
  • Representing Queries and Determining Similarity based on An ARIMA Model (Appl. No. 20090006326)
  • Identification of Events of Search Queries (Appl. No. 11/770,423)
  • Forecasting Time-Dependent Search Queries (Appl. No. 11/770,385)
  • Learning Latent Semantic Space for Ranking
  • Identification of Similar Queries based on Overall and Partial Similarity of Time Series
  • Determination of Time Dependency of Search Queries (Appl. No. 11/770,358.)
  • Forecasting Time Independent Search Queries (Appl. No. 11/770,445)
  • Scalable Parallel User Clustering in Discrete Time Window (Appl. No. 20100169258)
  • Learning User Intent from Rule-based Training Data (MS# 329229.01)
  • Related Links Recommendation (MS# 329226.01)


Yong Luo, Jian Tang, Jun Yan, Chao Xu, Zheng Chen, Pre-Trained Multi-View Word Embedding Using Two-Side Neural Network, AAAI, 2014

Xiang Ren, Yujing Wang, Xiao Yu, Jun Yan, Zheng Chen, Jiawei Han, Heterogeneous graph-based Intent Learning with Queries, Web Pages and Wikipedia Concepts, WSDM, 2014

Xingxing Zhang, Jianwen Zhang, Junyu Zeng, Jun Yan, Zheng Chen, Zhifang Sui, Towards Accurate Distant Supervision for Relational Facts Extraction, ACL, 2013