Partner Research Manager in Business AI at Microsoft AI & Research. From 2014 to 2017, I was Partner Research Manager at Deep Learning Technology Center (DLTC) at Microsoft Research, Redmond. I lead the development of AI solutions to Predictive Sales and Marketing. I also work on deep learning for text and image processing (see our JICAI2016 Tutorial or MS internal site) and lead the development of AI systems for dialogue, machine reading comprehension (MRC), and question answering (QA).

We are hiring Researchers with strengths in ML and NLP, and Software Engineers with rich product experience.

DSSM [project site]: We have developed a series of deep semantic similarity models (DSSM, also a.k.a. Sent2Vec), which have been used for many text and image processing tasks, including web search [Huang et al. 2013, Shen et al. 2014], recommendation [Gao et al. 2014a], machine translation [Gao et al. 2014b], and QA [Yih et al. 2015].

MRC [project site]: We released a new MRC dataset, called MS MARCO; and have developed a series of reasoning networks for MRC, aka ReasoNet and ReasoNet with shared memory.

Dialogue: We have developed neural network models for social bots trained on Twitter data [project site] and task-completion bots [project site]trained via reinforcement learning using a user simulator.

From 2006 to 2014, I was Principal Researcher at Natural Language Processing Group at Microsoft Research, Redmond. I worked on Web search, query understanding and reformulation, ads prediction, and statistical machine translation.

From 2005 to 2006, I was a research lead in Natural Interactive Services Division at Microsoft. I worked on Project X, an effort of developing natural user interface for Windows.

From 1999 to 2005, I was Research Lead in Natural Language Computing Group at Microsoft Research Asia. I, together with my colleagues, developed the first Chinese speech recognition system released with Microsoft Office, the Chinese/Japanese Input Method Editors (IME) which were the leading products in the market, and the natural language platform for Windows Vista.

Currently, I live with my family in Woodinville, WA.


Vision and Language Intelligence

Established: June 28, 2017

This project aims at driving disruptive advances in vision and language intelligence. We believe future breakthroughs in multimodal intelligence will empower smart communications between humans and the world and enable next-generation scenarios such as a universal chatbot and intelligent augmented reality. To these ends, we are focusing on understanding, reasoning, and generation across language and vision, and creation of intelligent services, including vision-to-text captioning, text-to-vision generation, and question answering/dialog about images and videos.

Deep Learning for Machine Reading Comprehension

Established: September 1, 2016

The goal of this project is to teach a computer to read and answer general questions pertaining to a document. We recently released a large scale MRC dataset, MS MARCO.  We developed a ReasoNet model to mimic the inference process of human readers. With a question in mind, ReasoNets read a document repeatedly, each time focusing on different parts of the document until a satisfying answer is found or formed. The extension of ReasoNet (ReasoNet-Memory)…

JointSLU: Joint Semantic Frame Parsing for Spoken Language Understanding

Sequence-to-sequence deep learning has recently emerged as a new paradigm in supervised learning for spoken language understanding. However, most of the previous studies explored this framework for building single domain models for each task, such as slot filling or domain classification, comparing deep learning based approaches with conventional ones like conditional random fields. This project focuses on a holistic multi-domain, multi-task (i.e. slot filling, domain and intent detection) modeling approach to estimate complete semantic frames…

MS-Celeb-1M: Challenge of Recognizing One Million Celebrities in the Real World

Established: June 29, 2016

MSR Image Recognition Challenge (IRC) @ACM Multimedia 2016 Import Dates/Updates: New! We are hosting new challenges at ICCV 2017. Visit MsCeleb.org for more details. Participants information disclosed in "Team Information" section below 6/21/2016: Evaluation Result Announced in "Evaluation Result " section below. 6/17/2016: Evaluation finished. 14 teams finished the grand challenge! 6/13/2016: Evaluation started. 6/13/2016: Dry run finished, 14 out of 19 teams passed, see details in "Update Details" below 6/10/2016: Dry run update 3: 8 teams…

Deep Reinforcement Learning for Goal-Oriented Dialogues

Established: April 18, 2016

This project aims to develop intelligent dialogue agents to help users effectively accomplish tasks via natural language conversation. A typical goal-oriented dialogue system contains three major components: natural language understanding (NLU), natural language generation (NLG), and dialogue management (DM) that consists of state tracking and policy learning. Our research focus is on deep reinforcement learning approaches for dialogue management in goal-oriented dialogue settings, including movie ticket booking, trip planning, sales assistant etc. User Simulator Training…

From Captions to Visual Concepts and Back

Established: April 9, 2015

We introduce a novel approach for automatically generating image descriptions. Visual detectors, language models, and deep multimodal similarity models are learned directly from a dataset of image captions. Our system is state-of-the-art on the official Microsoft COCO benchmark, producing a BLEU-4 score of 29.1%. Human judges consider the captions to be as good as or better than humans 34% of the time.


Established: January 30, 2015

The goal of this project is to develop a class of deep representation learning models. DSSM stands for Deep Structured Semantic Model, or more general, Deep Semantic Similarity Model. DSSM, developed by the MSR Deep Learning Technology Center(DLTC), is a deep neural network (DNN) modeling technique for representing text strings (sentences, queries, predicates, entity mentions, etc.) in a continuous semantic space and modeling semantic similarity between two text strings (e.g., Sent2Vec). DSSM has wide applications including information retrieval…

Data-Driven Conversation

Established: June 1, 2014

This project aims to enable people to converse with their devices. We are trying to teach devices to engage with humans using human language in ways that appear seamless and natural to humans. Our research focuses on statistical methods by which devices can learn from human-human conversational interactions and can situate responses in the verbal context and in physical or virtual environments. Natural and Engaging Agents that process human language will play a growing role…


Established: April 4, 2012

Statistical Parsing and Linguistic Analysis Toolkit is a linguistic analysis toolkit. Its main goal is to allow easy access to the linguistic analysis tools produced by the Natural Language Processing group at Microsoft Research. The tools include both traditional linguistic analysis tools such as part-of-speech taggers and parsers, and more recent developments, such as sentiment analysis (identifying whether a particular of text has positive or negative sentiment towards its focus) Demo URL: You can find…

Microsoft Research ESL Assistant

Established: May 9, 2008

The Microsoft Research ESL Assistant is a web service that provides correction suggestions for typical ESL (English as a Second Language) errors. Such errors include, for example, the choice of determiners (the/a) and the choice of prepositions. The web service also provides word choice suggestions from a thesaurus. In order to help the user make decisions on whether to accept a suggestion, the service displays "before and after" web search…





From Captions to Visual Concepts and Back
Hao Fang, Saurabh Gupta, Forrest Iandola, Rupesh Srivastava, Li Deng, Piotr Dollar, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John Platt, Larry Zitnick, Geoffrey Zweig, in The proceedings of CVPR, IEEE – Institute of Electrical and Electronics Engineers, June 1, 2015, View abstract, Download PDF


















Link description

Deep Learning for Text Processing


August 4, 2014


Li Deng, Eric Xing, Xiaodong He, Jianfeng Gao, Christopher Manning, Paul Smolensky, and Jeff A Bilmes


MSR, Carnegie Mellon University, Microsoft Research, Redmond, MSR Redmond, Stanford, Johns Hopkins University, University of Washington

Link description

UW/MS symposium


June 6, 2008


Danyel Fisher, Douglas Downey, Chris Quirk, Scott Drellishak, Kelly O'Hara, Emily M. Bender, Sumit Basu, Matthew Hurst, Arnd Christian König, Michael Gamon, Chris Brockett, Dmitriy Belenko, Bill Dolan, Jianfeng Gao, and Lucy Vanderwende


Scalable Language-Model-Building Tool

October 2010

This scalable language-model tool is used to build language models from large amounts of data. It supports modified absolute discounting and Kneser-Ney smoothing. The tool has been used successfully to build a seven-gram language model on 40 billion words within eight hours.

Size: 11 MB

    Click the icon to access this download

  • Website

Bayesian Estimators for Unsupervised HMM Part-of-Speech Tagger

August 2009

    Click the icon to access this download

  • Website


February 2008

    Click the icon to access this download

  • Website

NLP Data Sets for Comparative Study of Parameter-Estimation Methods

June 2007

    Click the icon to access this download

  • Website