Portrait de Xu Tan

Xu Tan

Directeur principal de la recherche

À propos

Xu Tan is a Principal Researcher and Research Manager at Microsoft. His research covers generative AI and language/speech/music/avatar/video processing. He has published many papers on top AI conferences with over 10,000 citations and transferred many technologies to Microsoft products (e.g., Azure, Bing, and .Net). He has developed machine translation and speech synthesis systems that achieved human-level quality in text translation and speech synthesis, and won champions in the WMT machine translation competition and Blizzard speech synthesis challenge. He has designed several popular language/speech/music models and systems (e.g., MASS, FastSpeech/NaturalSpeech, Muzic), published the first book on neural text-to-speech synthesis, and won the best student paper award at ISMIR 2023. He is a senior member of IEEE, an Action Editor of Transactions on Machine Learning Research (TMLR), and an Area Chair of some AI conferences (e.g., NeurIPS, AAAI, ICASSP). E-mail: tanxu2012@gmail.com.

We are hiring!

We are hiring full-time researchers on Audio/Video Generation and LLMs! Please email me (tanxu2012@gmail.com) if you are interested.

Recent Activities

Research Topics & Projects

Papiers de Collaboration

(For the full publication list, please go to https://scholar.google.com/citations?user=tob-U1oAAAAJ)

  • Xu Tan, Jiawei Chen, Haohe Liu, Jian Cong, Chen Zhang, Yanqing Liu, Xi Wang, Yichong Leng, Yuanhao Yi, Lei He, Frank Soong, Tao Qin, Sheng Zhao, Tie-Yan Liu, NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality, arXiv 2022. [Paper] [Demo]
  • Xu Tan, Tao Qin, Frank Soong, Tie-Yan Liu, A Survey on Neural Speech Synthesis, arXiv 2021. [Paper] [Article-1] [Article-2] [Github]
  • Chang Liu, Xu Tan, Chongyang Tao, Zhenxin Fu, Dongyan Zhao, Tie-Yan Liu, Rui Yan, ProphetChat: Enhancing Dialogue Generation with Simulation of Future Conversation, ACL 2022. [Paper]
  • Yi Ren, Xu Tan, Tao Qin, Zhou Zhao, Tie-Yan Liu, Revisiting Over-Smoothness in Text to Speech, ACL 2022. [Paper]
  • Sang-gil Lee, Heeseung Kim, Chaehun Shin, Xu Tan, Chang Liu, Qi Meng, Tao Qin, Wei Chen, Sungroh Yoon, Tie-Yan Liu, PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Driven Adaptive Prior, ICLR 2022. [Paper]
  • Zehua Chen, Xu Tan, Ke Wang, Shifeng Pan, Danilo Mandic, Lei He, Sheng Zhao, InferGrad: Improving Diffusion Models for Vocoder by Considering Inference in Training, ICASSP 2022. [Paper]
  • Liyang Chen, Zhiyong Wu, Jun Ling, Runnan Li, Xu Tan, Sheng Zhao, Transformer-S2A: Robust and Efficient Speech-to-Animation, ICASSP 2022. [Paper]
  • Guangyan Zhang, Yichong Leng, Daxin Tan, Ying Qin, Kaitao Song, Xu Tan, Sheng Zhao, Tan Lee, A Study on the Efficacy of Model Pre-training in Developing Neural Text-to-Speech System, ICASSP 2022. [Paper]
  • Yan Zhao, Weicong Chen, Xu Tan, Kai Huang, Jihong Zhu, Adaptive Logit Adjustment Loss for Long-Tailed Visual Recognition, AAAI 2022.
  • Yanqing Liu, Zhihang Xu, Gang Wang, Kuan Chen, Bohan Li, Xu Tan, Jinzhu Li, Lei He, Sheng Zhao, DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021, Blizzard Challenge 2021. [Paper]
  • Jin Xu, Xu Tan, Kaitao Song, Renqian Luo, Yichong Leng, Tao Qin, Tie-Yan Liu, Jian Li, Analyzing and Mitigating Interference in Neural Architecture Search, arXiv 2021. [Paper]
  • Chen Zhang, Jiaxing Yu, LuChin Chang, Xu Tan, Jiawei Chen, Tao Qin, Kejun Zhang, PDAugment: Data Augmentation by Pitch and Duration Adjustments for Automatic Lyrics Transcription, arXiv 2021. [Paper]
  • Zeqian Ju, Peiling Lu, Xu Tan, Rui Wang, Chen Zhang, Songruoyao Wu, Kejun Zhang, Xiangyang Li, Tao Qin, Tie-Yan Liu, TeleMelody: Lyric-to-Melody Generation with a Template-Based Two-Stage Method, arXiv 2021. [Paper]
  • Yichong Leng, Xu Tan, Rui Wang, Linchen Zhu, Jin Xu, Wenjie Liu, Linquan Liu, Tao Qin, Xiang-Yang Li, Ed Lin, Tie-Yan Liu, FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition, EMNLP 2021. [Paper]
  • Yichong Leng, Xu Tan, Linchen Zhu, Jin Xu, Renqian Luo, Linquan Liu, Tao Qin, Xiang-Yang Li, Ed Lin, Tie-Yan Liu, FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition, NeurIPS 2021. [Paper]
  • Jiawei Chen, Xu Tan, Yichong Leng, Jin Xu, Guihua Wen, Tao Qin, Tie-Yan Liu, Speech-T: Transducer for Text to Speech and BeyondNeurIPS 2021.
  • Xu Tan, Xiaobing Li, A Tutorial on AI Music Composition, ACM Multimedia 2021. [Paper]
  • Yuzi Yan, Xu Tan, Bohan Li, Guangyan Zhang, Tao Qin, Sheng Zhao, Yuan Shen, Wei-Qiang Zhang, Tie-Yan Liu, AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style, INTERSPEECH 2021. [Paper]
  • Wenxin Hou, Jindong Wang, Xu Tan, Tao Qin, Takahiro Shinozaki, Cross-domain Speech Recognition with Unsupervised Character-level Distribution Matching, INTERSPEECH 2021. [Paper]
  • Rui Wang, Xu Tan, Renqian Luo, Tao Qin, Tie-Yan Liu, A Survey on Low-Resource Neural Machine Translation, IJCAI 2021. [Paper]
  • Jin Xu, Xu Tan, Renqian Luo, Kaitao Song, Jian Li, Tao Qin, Tie-Yan Liu, NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search, KDD 2021. [Paper]
  • Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin, Tie-Yan Liu, MusicBERT: Symbolic Music Understanding with Large-Scale Pre-TrainingACL 2021. [Paper] [Article]
  • Lanqing Xue, Kaitao Song, Duocai Wu, Xu Tan,Nevin L. Zhang, Tao Qin, Wei-Qiang Zhang, Tie-Yan Liu, DeepRapper: Neural Rap Generation with Rhyme and Rhythm ModelingACL 2021. [Paper] [Article-1] [Article-2]
  • Yuzi Yan, Xu Tan, Bohan Li, Tao Qin, Sheng Zhao, Yuan Shen, Tie-Yan Liu, AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data. ICASSP 2021. [Paper]
  • Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Jinzhu Li, Sheng Zhao, Enhong Chen, Tie-Yan Liu, LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search. ICASSP 2021. [Paper]
  • Chen Zhang, Yi Ren, Xu Tan, Jinglin Liu, Kejun Zhang, Tao Qin, Sheng Zhao, Tie-Yan Liu, DenoiSpeech: Denoising Text to Speech with Frame-Level Noise Modeling. ICASSP 2021. [Paper]
  • Linghui Meng, Jin Xu, Xu Tan, Jindong Wang, Tao Qin, Bo Xu, MixSpeech: Data Augmentation for Low-Resource Automatic Speech Recognition. ICASSP 2021. [Paper]
  • Yichong Leng, Xu Tan, Sheng Zhao, Frank Soong, Xiang-Yang Li, Tao Qin, MBNet: MOS Prediction for Synthesized Speech with Mean-Bias Network, ICASSP 2021. [Paper]
  • Mingjian Chen, Xu Tan, Bohan Li, Yanqing Liu, Tao Qin, Sheng Zhao, Tie-Yan Liu, AdaSpeech: Adaptive Text to Speech for Custom Voice, ICLR 2021. [Paper]
  • Yi Ren, Chenxu Hu, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu, FastSpeech 2: Fast and High-Quality End-to-End Text to Speech, ICLR 2021. [Paper] [Blog]
  • Zhonghao Sheng, Kaitao Song, Xu Tan, Yi Ren, Wei Ye, Shikun Zhang, Tao Qin, SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint, AAAI 2021. [Paper]
  • Chen Zhang, Xu Tan, Yi Ren, Tao Qin, Kejun Zhang, Tie-Yan Liu, UWSpeech: Speech to Speech Translation for Unwritten Languages, AAAI 2021. [Paper]
  • Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu, MPNet: Masked and Permuted Pre-training for Language Understanding, NeurIPS 2020. [Paper] [Blog] [Code@Github] [Code@Huggingface]
  • Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Enhong Chen, Tie-Yan Liu, Semi-Supervised Neural Architecture Search, NeurIPS 2020. [Paper] [Blog] [Code@Github]
  • Jiawei Chen, Xu Tan, Jian Luan, Tao Qin, Tie-Yan Liu, HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis, arXiv 2020. [Paper]
  • Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Enhong Chen, and Tie-Yan Liu, Neural Architecture Search with GBDT, arXiv 2020. [Paper]
  • Weicong Chen, Xu Tan, Yingce Xia, Tao Qin, Yu Wang, Tie-Yan Liu, DualLip: A System for Joint Lip Reading and Generation, ACM Multimedia 2020. [Paper]
  • Yi Ren, Jinzheng He, Xu Tan, Tao Qin, Zhou Zhao, Tie-Yan Liu, PopMAG: Pop Music Accompaniment Generation, ACM Multimedia 2020. [Paper]
  • Peiling Lu, Jie Wu, Jian Luan, Xu Tan, Li Zhou, XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System, INTERSPEECH 2020. [Paper]
  • Mingjian Chen, Xu Tan, Yi Ren, Jin Xu, Hao Sun, Sheng Zhao, Tao Qin, Tie-Yan Liu, MultiSpeech: Multi-Speaker Text to Speech with Transformer, INTERSPEECH 2020. [Paper]
  • Jin Xu, Xu Tan, Yi Ren, Tao Qin, Jian Li, Sheng Zhao, Tie-Yan Liu, LRSpeech: Extremely Low-Resource Speech Synthesis and RecognitionKDD 2020. [Paper] [Blog]
  • Yi Ren, Xu Tan, Tao Qin, Jian Luan, Zhou Zhao, Tie-Yan Liu, DeepSinger: Singing Voice Synthesis with Data Mined From the WebKDD 2020. [Paper]
  • Jinglin Liu, Yi Ren, Xu Tan, Chen Zhang, Tao Qin, Zhou Zhao, Tie-Yan Liu, Task-Level Curriculum Learning for Non-Autoregressive Neural Machine Translation, IJCAI 2020. [Paper]
  • Kaitao Song, Xu Tan, Jianfeng Lu, Neural Machine Translation with Error Correction, IJCAI 2020. [Paper]
  • Yi Ren, Jinglin Liu, Xu Tan, Chen Zhang, Tao QIN, Zhou Zhao and Tie-Yan Liu, SimulSpeech: End-to-End Simultaneous Speech to Text Translation, ACL 2020. [Paper]
  • Yi Ren, Jinglin Liu, Xu Tan, Zhou Zhao, Sheng Zhao and Tie-Yan Liu, A Study of Non-autoregressive Model for Sequence Generation, ACL 2020. [Paper]
  • Kaitao Song, Hao Sun, Xu Tan, Tao Qin, Jianfeng Lu, Hongzhi Liu, Tie-Yan Liu, LightPAFF: A Two-Stage Distillation Framework for Pre-training and Fine-tuning, arXiv 2020. [Paper]
  • Jiale Chen, Xu Tan, Chaowei Shan, Sen Liu, Zhibo Chen, VESR-Net: The Winning Solution to Youku Video Enhancement and Super-Resolution Challenge, arXiv 2020. [Paper][Article]
  • Tomoki Hayashi, Ryuichi Yamamoto, Katsuki Inoue, Takenori Yoshimura, Shinji Watanabe, Tomoki Toda, Kazuya Takeda, Yu Zhang, Xu Tan, ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit, ICASSP 2020. [Paper]
  • Junliang Guo, Xu Tan, Linli Xu, Tao Qin, Enhong Chen, Tie-Yan Liu, Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation, AAAI 2020.  [Paper]
  • Hao Sun, Xu Tan, Jun-Wei Gan, Sheng Zhao, Dongxu Han, Hongzhi Liu, Tao Qin, and Tie-Yan Liu, Knowledge Distillation from BERT in Pre-training and Fine-tuning for Polyphone Disambiguation, ASRU 2019. [Paper]
  • Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu, FastSpeech: Fast, Robust and Controllable Text to Speech, NeurIPS 2019. [Paper] [Demo] [Article] [Reddit]
  • Xu Tan, Jiale Chen, Di He, Yingce Xia, Tao Qin and Tie-Yan Liu, Multilingual Neural Machine Translation with Language Clustering, EMNLP 2019. [Paper]
  • Yingce Xia, Xu Tan, Fei Tian, Fei Gao, Weicong Chen, Yang Fan, Linyuan Gong, Yichong Leng, Renqian Luo, Yiren Wang, Lijun Wu, Jinhua Zhu, Tao Qin, Tie-Yan Liu, Microsoft Research Asia’s Systems for WMT19, WMT 2019. [Paper]
  • Lijun Wu, Xu Tan, Tao Qin, Jianhuang Lai, Tie-Yan Liu, Beyond Error Propagation: Language Branching Also Affects the Accuracy of Sequence Generation, IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP) 2019. [Paper]
  • Yan Lu, Yuanchao Shu, Xu Tan, Yunxin Liu, Mengyu Zhou, Qi Chen, Dan Pei, Collaborative Learning between Cloud and End Devices: An Empirical Study on Location Prediction, ACM/IEEE Symposium on Edge Computing (SEC) 2019. [Paper]
  • Tianyu He, Jiale Chen, Xu Tan, Tao Qin, Language Graph Distillation for Low-Resource Machine Translation, arXiv 2019. [Paper]
  • Tianyu He, Xu Tan, Tao Qin, Hard but Robust, Easy but Sensitive: How Encoder and Decoder Perform in Neural Machine Translation, arXiv 2019. [Paper]
  • Xu Tan, Yichong Leng, Jiale Chen, Yi Ren, Tao Qin, Tie-Yan Liu, A Study of Multilingual Neural Machine Translation, arXiv 2019. [Paper]
  • Yichong Leng, Xu Tan, Tao Qin, Xiang-Yang Li and Tie-Yan Liu, Unsupervised Pivot Translation for Distant Languages, ACL 2019. [Paper]
  • Tianyu He, Yingce Xia, Jianxin Lin, Xu Tan, Di He, Tao Qin, Zhibo Chen, Deliberation Learning for Image-to-Image Translation, IJCAI 2019.
  • Hao Sun, Xu Tan, Jun-Wei Gan, Hongzhi Liu, Sheng Zhao, Tao Qin, Tie-Yan Liu, Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion, INTERSPEECH 2019. [Paper]
  • Yi Ren, Xu Tan, Tao Qin, Zhou Zhao, Sheng Zhao, Tie-Yan Liu, Almost Unsupervised Text to Speech and Automatic Speech Recognition, ICML 2019. [Paper] [Demo] [Article] [Blog] [Slides] [Video]
  • Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu, MASS: Masked Sequence to Sequence Pre-training for Language Generation, ICML 2019. [Paper][Code@Github][Article][Blog]
  • Xu Tan, Yi Ren, Di He, Tao Qin, Tie-Yan Liu, Multilingual Neural Machine Translation with Knowledge Distillation, ICLR 2019. [Paper] [Code@GitHub]
  • Jun Gao, Di He, Xu Tan, Tao Qin, Liwei Wang, Tie-Yan Liu, Representation Degeneration Problem in Training Natural Language Generation Models, ICLR 2019. [Paper]
  • Junliang Guo, Xu Tan, Di He, Tao Qin, and Tie-Yan Liu, Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input, AAAI 2019. [Paper]
  • Chengyue Gong, Xu Tan, Di He, and Tao Qin, Sentence-wise Smooth Regularization for Sequence to Sequence Learning, AAAI 2019. [Paper]
  • Yingce Xia, Tianyu He, Xu Tan, Fei Tian, Di He, and Tao Qin, Tied Transformers: Neural Machine Translation with Shared Encoder and Decoder, AAAI 2019. [Paper]
  • Chengyue Gong, Di He, Xu Tan, Tao Qin, Liwei Wang, and Tie-Yan Liu, FRAGE: Frequency-Agnostic Word Representation, NIPS 2018. [Paper] [Code@Github]
  • Tianyu He, Xu Tan, Yingce Xia, Di He, Tao Qin, Zhibo Chen, and Tie-Yan Liu, Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation, NIPS 2018.[Paper]
  • Lijun Wu, Xu Tan, Di He, Fei Tian, Tao Qin, Jianhuang Lai, and Tie-Yan Liu, Beyond Error Propagation in Neural Machine Translation: Characteristics of Language Also Matter, EMNLP 2018. [Paper]
  • Yingce Xia, Xu Tan, Fei Tian, Tao Qin, Nenghai Yu, and Tie-Yan Liu, Model-Level Dual Learning, ICML 2018. [Paper]
  • Kaitao Song, Xu Tan, Furong Peng, Jianfeng Lu, Hybrid Self-Attention Network for Machine Translation, arXiv 2018. [Paper]
  • Kaitao Song, Xu Tan, Di He, Jianfeng Lu, Tao Qin, and Tie-Yan Liu, Double Path Networks for Sequence to Sequence Learning, COLING 2018. [Paper] [Code@Github]
  • Hany Hassan, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, Shujie Liu, Tie-Yan Liu, Renqian Luo, Arul Menezes, Tao Qin, Frank Seide, Xu Tan, Fei Tian, Lijun Wu, Shuangzhi Wu, Yingce Xia, Dongdong Zhang, Zhirui Zhang, Ming Zhou, Achieving Human Parity on Automatic Chinese to English News Translation, arXiv 2018. [Paper] [Article-1] [Article-2] [Video]
  • Yanyao Shen, Xu Tan, Di He, Tao Qin, and Tie-Yan Liu, Dense Information Flow for Neural Machine Translation, NAACL 2018. [Paper] [Code@Github]