Portrait of Xiaodong He

Xiaodong He

Principal Researcher, Research Manager


Xiaodong He is a Principal Researcher in the Deep Learning Technology Center of Microsoft Research, Redmond, WA, USA. He is also an Affiliate Professor in the Department of Electrical Engineering at the University of Washington (Seattle), serves in doctoral supervisory committees. His research interests are mainly in artificial intelligence areas including deep learning, natural language, computer vision, speech, information retrieval, and knowledge representation.

He has published more than 100 papers in ACL, EMNLP, NAACL, CVPR, SIGIR, WWW, CIKM, NIPS, ICLR, ICASSP, Proc. IEEE, IEEE TASLP, IEEE SPM, and other venues. He received several awards including the Outstanding Paper Award at ACL 2015. He has led the development of the MSR-NRC-SRI entry and the MSR entry that won the No. 1 Place in the 2008 NIST Machine Translation Evaluation and the 2011 IWSLT Evaluation (Chinese-to-English), respectively. He is also a co-inventor of the DSSM (20132014a, 2014b), which is broadly applied to language, vision, IR and knowledge representation tasks. More recently, he and colleagues developed the MSR image captioning system that achieves the highest score in the Turing test and won the first prize, tied with Google, at the COCO Captioning Challenge 2015. His work was reported by Communications of the ACM in January 2016. He is leading the image captioning effort now is part of Microsoft Cognitive Services and CaptionBot. The work was widely covered in media including Business InsiderTechCrunchForbes, The Washington Post, CNN, BBC. The services also support applications such as Seeing AI, Microsoft Word and PowerPoint, etc.

He has held editorial positions on several IEEE Journals, served as an area chair for NAACL-HLT 2015, and served in the organizing committee/program committee of major speech and language processing conferences. He is an elected member of the IEEE SLTC for the term of 2015-2017. He is a senior member of IEEE and a member of ACL. He was elected as the Chair of the IEEE Seattle Section in 2016.

He received a bachelor degree from Tsinghua University (Beijing) in 1996, MS degree from Chinese Academy of Sciences (Beijing) in 1999, and the PhD degree from the University of Missouri – Columbia in 2003.


Image Captioning Service starts to serve Microsoft Office users – Word and PowerPoint will use AI to automatically write photo descriptions – more on Office blogsThe Verge, VentureBeat.
CaptionBot (http://CaptionBot.ai) attracts users all over the world to send in millions of pictures for captioning. Lots of fun stories are shared at Business Insider, TechCrunchEngadgetThe Washington PostForbes, CNN, GizmodoBBC, The TelegraphDaily Mail, The Guardian, Mashable, and more (tech summarized here).
Seven long papers accepted by NAACL, ACL, and EMNLP in 2016 (509 in total) on Deep learning/Reinforcement learning for Playing Text-based Games / Understanding Reddit, Question Answering, Parsing, Vision & LanguageSentiment & Topic Classification, and Understanding Commonsense Stories. Plus other publications at CVPR(Oral), ECCV, WWW, SIGIR, ICLR, NAACL(short)ICASSP, and IEEE-TASLP.
Visited UC Berkeley Vision group in April, gave a talk on “Multimodal Learning for Image Captioning and Visual Question Answering” at BLVC.
Communications of the ACM interviewed Fei-Fei Li,  Rob Fergus,  Richard Zemel and me on recent progress in computer vision and language processing, reported in “Seeing More Clearly” in the Jan. 2016 issue.
One of the UW/EE Faculty Elected IEEE Seattle Section Officers (2016 Chair of IEEE Seattle Section)
Business Insider reported our deep image question answering work in “Microsoft Research creates a multi-step reasoning computer.” Also covered by ZDNet, eWeek, and others. Paper is accepted as Oral presentation at CVPR2016.
Invited talk: Towards Human-level Quality Image Captioning: Deep Semantic Learning of Text and Images, at Deep Learning Workshop, San Francisco, August 8, 2015.
Our MSR entry won the 1st Prize, tied with Google, at the MS COCO Captioning Challenge 2015, achieved the highest score in the Turing Test among all submissions. More details in the CVPR paper , demo, relevant talk, and recent media coverage by Microsoft blog, TechNet, SlashGear, Engadget, ventureBeat, androidHeadlines.
Amittai Axelrod graduated with the PhD degree from UW in August 2014 (co-adv. with Mari Ostendorf, group photo and ceremony.) Congratulations!

Invited talks, tutorials, and code release

Released the predictors, trained models, and source code of the trainer of the DSSM (Deep Structured Semantic Model), which is to project natural language sentences to semantic vectors (Sent2Vec).
Xiaodong He, Deep Learning for Natural Language Processing, part A & part B. One-day lecture at the ML Summer School, Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong (Shenzhen). August 4th, 2016.
Xiaodong He, Multimodal Learning for Image Captioning and Visual Question Answering at BLVC, UC Berkeley, 2016.
Wen-tau Yih, Xiaodong He, and Jianfeng Gao, Deep Learning and Continuous Representations for NLP (Tutorial for IJCAI-2016), 9 July 2016.
Xiaodong He, Towards Human-level Quality Image Captioning: Deep Semantic Learning of Text and Images (Invited Talk), Deep Learning Workshop, August 2015.
Xiaodong He, Deep Semantic Learning: Teach machines to understand text, image, and knowledge graph (Invited talk at CVPR DeepVision workshop), June 2015.
Wen-tau Yih, Xiaodong He, and Jianfeng Gao, Deep Learning and Continuous Representations for NLP (Tutorial for NAACL-HLT-2015), (slides, video), 31 May 2015.
Xiaodong He and Wen-tau Yih, Deep Learning and Continuous Representations for Language Processing (Tutorial for IEEE-SLT-2014), IEEE Spoken Language Technology (SLT), December 2014
Xiaodong He, Jianfeng Gao, and Li Deng, Deep Learning for Natural Language Processing: Theory and Practice (Tutorial), ACM International Conference on Information and Knowledge Management (CIKM), November 2014
Xiaodong He, Towards Deep Understanding: Deep Learning for Selected Natural Language Applications. Invited talk at the UW/EE Research Colloquium Series, University of Washington, Seattle, October 2014 (Lecture Slides, Video).
Xiaodong He, Jianfeng Gao, and Li Deng, Deep learning for natural language processing and related applications (Tutorial), IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2014
Bowen Zhou and Xiaodong He, Speech Translation: Theory and Practices (Tutorial), IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2013





Recent media coverage

News and events

Invited talks & tutorials

Selected work on Deep Learning and applications to NLP, Vision, SLU, IR, and Knowledge Representation

Academic services

  • Member of the IEEE Speech and Language Processing Technical Committee 2015-2017
  • Area Chair, Spoken Language Processing, NAACL 2015
  • Associate Editor, IEEE Signal Processing Letters since 2014
  • Member of the Organizing Committee, Chair of Special Sessions, IEEE ICASSP 2013
  • Associate Editor, IEEE Signal Processing Magazine since 2012
  • Guest Editor, Special Issue on Continuous-space and related methods in natural language processing, in IEEE Transactions on Audio, Speech, and Language Processing, 2014
  • Guest Editor, Special Issue on Large-Scale Optimization for Audio, Speech, and Language Processing, in IEEE Transactions on Audio, Speech, and Language Processing, 2013
  • Lead Guest Editor, Special Issue on Statistical Learning Methods for Speech and Language Processing, in IEEE Journal of Selected Topics in Signal Processing, 2010
  • Co-Chair, NIPS 2008 Workshop on Speech and Language: Learning-Based Methods and Systems, Whistler, BC, Canada, 2008
  • Grant Reviewer: Swiss National Science Foundation
  • Program Committee Member: ACL, NAACL, EMNLP, COLING, AAAI
  • Reviewer: IEEE Transactions on Speech and Audio Processing, Proceedings of the IEEE, IEEE Signal Processing Magazine, IEEE Signal Processing Letters, IEEE Transactions on Computer, Speech Communication, Pattern Recognition, Pattern Recognition Letters, ICASSP, Interspeech, NIPS

Honors and awards

  • ACL 2015 Outstanding Paper Award
  • 1st Prize, MS COCO Captioning Challenge 2015
  • No. 1 Place, Chinese to English MT track, 2011 IWSLT Evaluation
  • No. 1 Place, Chinese to English common data track, 2008 NIST MT Evaluation
  • ICASSP 2011 Best Student Paper Award (co-author)
  • IEEE senior member since 2008
  • Microsoft Gold Star Award, 2005
  • Microsoft Patent awards, 2005-2014
  • Microsoft Technology Transfer Award, 2009, 2014

Special issues

NIPS 2008 workshop

The NIPS 2008 workshop on Speech and Language: Learning-based Methods and Systems covers a variety of advanced topics in the Speech and Language Processing area. More details can be found at the workshop’s homepage NIPS08 WSL(a)