Portrait of Nan Duan

Nan Duan

Lead Researcher


Dr. Nan DUAN (段楠) is a Lead Researcher in the Natural Language Computing group at Microsoft Research Asia. He is working on fundamental NLP tasks, especially on: question answering, natural language understanding, multi-modal NLP with visual contents, common sense and reasoning.

Before joining MSRA, he received his Ph.D. on Statistical Machine Translation from Tianjin University in 2011, under supervision of Dr. Ming ZHOU and Dr. Mu LI.

We are hiring researchers and interns! If you have strong publications and experiences in above areas and are willing to work in MSRA, hit me up. 


  • 2019-07-19: Unicoder (The Universal Language Encoder with Cross-lingual Pre-training) shipped to Bing. It holds SOTA results on various cross-lingual NLP tasks, including XNLI.
  • 2018-12-07: Natural Language Understanding (joint work with Bing) was reviewed by Bill Gates.
  • 2018-11-08: Video-based Conversational AI was reviewed by Bill Gates on MSRA’s 20th Anniversary.
  • 2018-09-10: The QA textbook《智能问答》co-authored with Ming ZHOU was published by the Higher Education Press. https://www.msra.cn/zh-cn/news/features/book-recommendation-qa-mt.



  • 段楠,周明. 《智能问答》. 高等教育出版社,2018.
  • 周明,段楠,刘树杰,吴俣. 《人工智能导论 – 第11章:自然语言处理》. 中国科学技术出版社,2018.

Publications (#: students I mentored/collaborated in MSRA)

  • Xindian MaPeng ZhangShuai ZhangNan DuanYuexian HouDawei SongMing Zhou. A Tensorized Transformer for Language Modeling. arXiv, 2019.
  • Chenfei Wu#, Yanzhao Zhou, Gen Li#, Nan Duan, Duyu Tang, Xiaojie Wang. Deep Reason: A Strong Baseline for Real-World Visual Reasoning. Visual Question Answering and Dialogue Workshop, CVPR, 2019.
  • Yikang Li#, Tao Ma, Yeqi Bai, Nan Duan, Sining Wei, Xiaogang Wang. PasteGAN: A Semi-Parametric Method to Generate Image from Scene Graph. arXiv, 2019.
  • Botian Shi, Lei Ji, Yaobo Liang, Zhendong NIU, Nan Duan, Ming Zhou. Dense Procedure Captioning in Narrated Instructional Videos. ACL, 2019.
  • Daya Guo, Duyu Tang, Nan Duan, Ming Zhou, Jian Yin. Coupling Retrieval and Meta-Learning for Context-Dependent Semantic Parsing. ACL, 2019.
  • Changzhi Sun, Yeyun Gong, Nan Duan, Ming Gong, Daxin Jiang, Shiliang Sun, Man Lan, Yuanbin Wu, Ming Zhou. Joint Type Inference on Entities and Relations via Graph Convolutional Networks. ACL, 2019.
  • Haoyu Zhang, Yeyun Gong, Nan Duan, Jianjun Xu, Ji Wang, Ming Zhou. Complex Question Decomposition for Semantic Parsing. ACL, 2019.
  • Botian Shi, Lei Ji, Pan Lu, Zhendong Niu, Nan Duan. Knowledge Aware Semantic Concept Expansion for Image-Text Matching. IJCAI, 2019.
  • Bo Shao, Yeyun Gong, Junwei Bao, Xiaola Lin, Jianshu Ji, Guihong Cao, Nan Duan. Weakly Supervised Multi-task Learning for Semantic Parsing. IJCAI, 2019.
  • Junwei Bao#, Duyu Tang, Nan Duan, Zhao Yan, Ming Zhou, Tiejun Zhao. Text Generation from Tables. Transactions on Audio, Speech and Language Processing, 2019.
  • Yibo Sun, Duyu Tang, Nan Duan, Jingjing Xu, Xiaocheng Feng, Bing Qin. Knowledge-Aware Conversational Semantic Parsing Over Web Tables. arXiv, 2018.
  • Daya Guo, Duyu Tang, Nan Duan, Jian Yin, Ming Zhou. Dialog-to-Action: Conversational Question Answering over a Large-Scale Knowledge Base. NeurIPS, 2018.
  • Daya Guo, Yibo Sun, Duyu Tang, Nan Duan, Ming Zhou, Jian Yin. Question Generation from SQL Queries Improves Neural Semantic Parsing. EMNLP, 2018.
  • Junwei Bao#, Yeyun Gong, Nan Duan, Ming Zhou, Tiejun Zhao. Question Generation with Doubly-Adversarial Nets. Transactions on Audio, Speech and Language Processing, 2018.
  • Pan Lu#, Lei Ji, Wei Zhang, Nan Duan, Ming Zhou, Jianyong Wang. R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering. KDD, 2018.
  • Yibo Sun#, Duyu Tang, Nan Duan, Jianshu Ji, Guihong Cao, Xiaocheng Feng, Bing Qin, Ting Liu, Ming Zhou. Semantic Parsing with Syntax- and Table-Aware SQL Generation. ACL, 2018.
  • Zhao Yan#, Nan Duan, Junwei Bao, Peng Chen, Ming Zhou, Zhoujun Li. Response Selection from Unstructured Documents for Human-Computer Conversation Systems. Knowledge-Based System, 2018.
  • Yikang Li#Nan DuanBolei ZhouXiao ChuWanli OuyangXiaogang Wang, Ming Zhou. Visual Question Generation as Dual Task of Visual Question Answering. CVPR, 2018.
  • Duyu Tang, Nan Duan, Zhao Yan, Zhirui Zhang, Yibo Sun, Shujie Liu, Yuanhua Lv, Ming Zhou. Learning to Collaborate for Question Answering and Asking. NAACL, 2018.
  • Junwei Bao#, Duyu Tang, Nan Duan, Zhao Yan, Yuanhua Lv, Ming Zhou, Tiejun Zhao. Table-to-Text: Describing Table Region with Natural Language. AAAI, 2018.
  • Zhao Yan#, Duyu Tang, Nan Duan, Shujie Liu, Wendi Wang, Daxin Jiang, Ming Zhou, Zhoujun Li. Assertion-based QA with Question-Aware Open Information Extraction. AAAI, 2018.
  • Yibo Sun, Daya Guo, Duyu Tang, Nan Duan, Zhao Yan, Xiaocheng Feng, Bing Qin. Knowledge Based Machine Reading Comprehension. arXiv, 2018.
  • Wanjun Zhong, Duyu Tang, Nan Duan, Ming Zhou, Jiahai Wang, Jian Yin. Improving Question Answering by Commonsense-Based Pre-Training. arXiv, 2018.
  • Nan Duan. Overview of the NLPCC 2017 Shared Task: Open Domain QA. NLPCC, 2017.
  • Duyu TangNan DuanTao QinZhao YanMing Zhou. Question Answering and Question Generation as Dual Tasks. arXiv, 2017.
  • Zhao Yan#Duyu TangNan DuanJunwei BaoYuanhua LvMing ZhouZhoujun Li. Content-Based Table Retrieval for Web Queries. arXiv, 2017.
  • Nan Duan, Duyu Tang, Peng Chen, Ming Zhou. Question Generation for Question Answering. EMNLP, 2017.
  • Zhao Yan#, Nan Duan, Peng Chen, Ming Zhou, Jianshe Zhou, Zhoujun Li. Building Task-Oriented Dialogue Systems for Online Shopping. AAAI, 2017.
  • Zhao Yan#, Nan Duan, Ming Zhou, Zhoujun Li. An Open Domain Topic Prediction Model for Answer Selection. NLPCC-ICCPOL, 2016.
  • Nan Duan. Overview of the NLPCC-ICCPOL 2016 Shared Task: Open Domain QA. NLPCC-ICCPOL, 2016.
  • Junwei Bao#, Nan Duan, Zhao Yan, Ming Zhou, Tiejun Zhao. Constraint-Based Question Answering with Knowledge Graph. COLING, 2016.
  • Zhao Yan#, Nan Duan, Junwei Bao, Peng Chen, Ming Zhou, Zhoujun Li, Jianshe Zhou. DocChat: An Information Retrieval Approach for Chatbot Engines Using Unstructured Documents. ACL, 2016.
  • Nan Duan. Overview of the NLPCC 2015 Shared Task: Open Domain QA. NLPCC, 2015.
  • Pengcheng Yin#, Nan Duan, Ben Kao, Junwei Bao, Ming Zhou. Answering Questions with Complex Semantic Constraints on Open Knowledge Bases. CIKM, 2015.
  • Min-Chul Yang#, Nan Duan, Ming Zhou, Hae-Chang Rim. Joint Relational Embeddings for Knowledge-based Question Answering. EMNLP, 2014.
  • Junwei Bao#, Nan Duan, Ming Zhou, Tiejun Zhao. Knowledge-based Question Answering as Machine Translation. ACL, 2014.
  • 段楠. 从图谱搜索看搜索技术的发展趋势. 《中国计算机学会通讯》, 2013.
  • Nan Duan. Minimum Bayes Risk based Answer Re-ranking for Question Answering. ACL, 2013.
  • Chenguang Wang#, Nan Duan, Ming Zhou, Ming Zhang. Paraphrasing Adaptation for Web Search Ranking. ACL, 2013.
  • Hong Sun#, Nan Duan, Yajuan Duan, Ming Zhou. Answer Extraction from Passage Graph for Factoid Question Answering. IJCAI, 2013.
  • Nan Duan, Mu Li, Ming Zhou. Forced Derivation Tree based Model Training to Statistical Machine Translation. EMNLP, 2012.
  • Nan Duan. Consensus Decoding to Statistical Machine Translation. Ph.D. thesis, 2012. (in Chinese)
  • Nan Duan, Mu Li, Ming Zhou. Improving Phrase Extraction via MBR Phrase Scoring and Pruning. MT Summit XIII, 2011.
  • Nan Duan, Mu Li, Ming Zhou. A Comparative Analysis of Consensus Decoding Methods for Statistical Machine Translation. Journal of Chinese Information Processing, 2011. (in Chinese)
  • Nan Duan, Mu Li, Ming Zhou. Hypothesis Mixture Decoding for Statistical Machine Translation. ACL, 2011.
  • Chi-Ho Li, Nan Duan, Yinggong Zhao, Shujie Liu, Lei Cui, Mei-yuh Hwang, Amittai Axelrod, Jianfeng Gao, Yaodong Zhang, Li Deng. The MSRA Machine Translation System for IWSLT 2010. IWSLT, 2010.
  • Nan Duan, Hong Sun, Ming Zhou. Translation Model Generalization using Probability Averaging for Machine Translation. COLING, 2010.
  • Nan Duan, Mu Li, Dongdong Zhang, Ming Zhou. Mixture Model-based Minimum Bayes Risk Decoding using Multiple Machine Translation Systems. COLING, 2010.
  • Nan Duan, Mu Li, Tong Xiao, Ming Zhou. The Feature Subspace Method for SMT System Combination. EMNLP, 2009.
  • Mu Li, Nan Duan, Dongdong Zhang, Chi-Ho Li, Ming Zhou. Collaborative Decoding: Partial Hypothesis Re-ranking using Translation Consensus between Decoders. ACL, 2009.
  • Dongdong Zhang, Chi-Ho Li, Nan Duan, Shujie Liu, Mu Li, Ming Zhou. MSRA Technical Report for the 5th China Workshop on Machine Translation. in CWMT, 2009.
  • Dongdong Zhang, Mu Li, Nan Duan, Chi-Ho Li, Ming Zhou. Measure Word Generation for English-Chinese SMT Systems. ACL, 2008.

Transfers & Patents

Technology Transfers

  1. Unicoder: The Universal Language Encoder with Cross-lingual Pre-training for Bing (joint work with Yaobo Liang and Haoyang Huang), 2019.
  2. Video-based QA for Bing (joint work with Lei Ji), 2019.
  3. QA-aware Pre-training for Bing (joint work with Yaobo Liang), 2018.
  4. Neural Semantic Parser for Cortana (joint work with Yeyun Gong and Duyu Tang), 2018.
  5. Question-aware Neural Open IE for Bing QA (joint work with Duyu Tang and Yaobo Liang), 2018.
  6. Text-based QA for Bing QA (joint work with Duyu Tang), 2017.
  7. Table-based QA for Bing QA (joint work with Duyu Tang), 2017.
  8. List-based QA for Bing QA (joint work with Duyu Tang), 2017.
  9. Knowledge-based QA for Xiaoice Core Chat, 2017.
  10. DocChat for Xiaoice Customer Service, 2016.
  11. Text Paraphrasing for EMOI Service in Sogou Mobile IME, 2016.
  12. Task-Oriented Dialogue System for Xiaoice Shopping Assistant on JD.COM, 2015.
  13. Query Rewriting for Bing Ads & Relevance, 2014.
  14. SCFG-based Semantic Parsing for Bing QA, 2014.
  15. NLP Ranker for Bing Relevance, 2013.


  1. Cross-lingual Pre-training for Search, Ads and News, 2019.
  2. User Intent Understanding with Transfer Learning, 2019.
  3. Controllable Text Style Transfer, 2019.
  4. Multi-modal QA/Chat, 2018.
  5. Knowledge Graph-based Conversational Question Answering, 2018.
  6. Assertion-based Question Answering, 2017.
  7. Generation of Text from Structured Data, 2017.
  8. Document-based Chat (DocChat), 2016.

Activities & Talks


  • Evaluation Co-Chair. NLPCC, 2016, 2017, 2018, 2019.
  • Distinguished Speaker. CCF, 2017.
  • Secretary of Committee on Terminology. CCF, 2016-2017.

NLP Lectures (2017-present)

  • Peking University
  • Tsinghua University
  • Tianjin University
  • Nankai University
  • Chinese Academy of Sciences
  • Nanjing University of Aeronautics and Astronautics



Here are some NLP datasets constructed by my team:

(1) MSParS (https://github.com/msra-nlc/MSParS)

It’s an open domain semantic parsing dataset.

(2) WebAsserstions (https://github.com/msra-nlc/WebAssertions)

It’s a question-aware open IE dataset.

(3) ChineseKBQA (https://github.com/msra-nlc/ChineseKBQA)

It’s an open domain knowledge-based QA dataset in Chinese.

(4) ChineseDBQA (https://github.com/msra-nlc/ChineseDBQA)

It’s an open domain document-based QA dataset in Chinese.

(5) Table2Text (https://github.com/msra-nlc/Table2Text)

It’s a table-to-text generation dataset.

Français du Canada English