Xu Tan
Senior Researcher
About
Xu Tan (谭旭) is a Senior Researcher in Machine Learning Group, Microsoft Research Asia (MSRA). His research interests cover machine learning, deep learning, and their applications on natural language/speech/music processing, including neural machine translation, pre-training, neural architecture search, text to speech, automatic speech recognition, music understanding and generation, etc. The machine translation systems developed by him have achieved human parity on Chinese-English machine translation in 2018 and won several champions on WMT machine translation competition in 2019. He has designed several popular language and speech models, such as MASS and FastSpeech, and has transferred many research works to the language and speech product in Microsoft. E-mail: xuta@microsoft.com
Recent Activities
- 2021-06-05: I give a talk on “Deep Learning based Pop Music Compositon” at GAITC 2021.
- 2021-06-02: 2 papers (DeepRapper, MusicBERT) accepted by ACL 2021, 1 paper (NAS-BERT) accepted by KDD 2021, 2 papers (including AdaSpeech 3) accepted by INTERSPEECH 2021.
- 2021-05-20: I give a webinar talk on Neural TTS.
- 2021-05-14: I give a talk on AI music composition in Tsinghua University.
- 2021-04-30: Our AdaSpeech has been deployed in Microsoft Azure TTS to support custom voice.
- 2021-02-22: Our MASS has been deployed in Microsoft Azure Translation Service to support NMT for 8 low-resource languages! [News]
- 2021-01-30: 5 papers (AdaSpeech 2, LightSpeech, DenoiSpeech, MixSpeech, MBNet) accepted by ICASSP 2021!
- 2021-01-26: I give a course on Pre-training Models.
- 2021-01-24: I give a tutorial on TTS in ISCSLP 2021!
- 2021-01-13: 2 papers (FastSpeech 2, AdaSpeech) accepted by ICLR 2021!
- 2020-12-15: I give a joint talk with NVIDIA on FastSpeech in NVIDIA GTC China 2020!
- 2020-12-10: Our MPNet has been integrated into Huggingface.
- 2020-12-02: 2 papers (SongMASS, UWSpeech) accepted by AAAI 2021!
- 2020-11-24: Our MASS has been deployed in Microsoft Bing for ads content generation!
- 2020-11-21: Our LRSpeech helps Azure TTS to extend 5 new low-resource languages! [News]
- 2020-09-26: 2 papers (MPNet, SemiNAS) accepted by NeurIPS 2020!
- 2020-07-08: Our FastSpeech has supported more than 70 languages in Microsoft Azure Text to Speech Service! [News-1] [News-2]
- 2019-10-08: I give a talk in Jiangmen on “Low data/computation resource text to speech” [Article]
- 2019-04-22: Our group won 8 out of 11 machine translation tasks we undertook in WMT 2019 News Translation Competition [Paper] [Link] [Article]
- 2018-03-14: We achieved human performance in Chinese to English News Translation [Paper] [Article-1] [Article-2] [Video]
Research Topics & Projects
- Speech Synthesis and Recognition
- Fast speech synthesis: FastSpeech [Paper], FastSpeech 2 [Paper], LightSpeech [Paper]
- Low-resource TTS and ASR: Almost Unsup TTS/ASR [Paper], LRSpeech [Paper], MixSpeech [Paper]
- Adaptive TTS for custom voice: AdaSpeech [Paper], AdaSpeech 2, AdaSpeech 3
- Multispeaker TTS: MultiSpeech [Paper]; Denoising TTS: DenoiSpeech [Paper]
- Vocoder: PriorGrad [Paper]; MOS evaluation: MBNet [Paper]
- Talking face synthesis: DualLip [Paper]; ASR error correction: FastCorrect [Paper]
- TTS frontend: Grapheme-to-phoneme [Paper], Polyphone disambiguation [Paper]
- Our speech related research works: https://speechresearch.github.io/
- Pre-training
- Neural Machine Translation
- Model structure: [Paper-1] [Paper-2] [Paper-3] [Paper-4] [Paper-5]
- Multilingual and low-resource NMT: [Paper-1] [Paper-2] [Paper-3] [Paper-4] [Paper-5] [Paper-6]
- Non-autoregressive NMT: [Paper-1] [Paper-2] [Paper-3] [Paper-4]
- Linguistic analysis: [Paper-1] [Paper-2]
- Speech translation: SimulSpeech [Paper], UWSpeech [Paper]
- AI Music
- Neural Architecture Search
Publications
- Sang-gil Lee, Heeseung Kim, Chaehun Shin, Xu Tan, Chang Liu, Qi Meng, Tao Qin, Wei Chen, Sungroh Yoon, Tie-Yan Liu, PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Driven Adaptive Prior, arXiv 2021. [Paper]
- Yichong Leng, Xu Tan, Linchen Zhu, Jin Xu, Renqian Luo, Linquan Liu, Tao Qin, Xiang-Yang Li, Ed Lin, Tie-Yan Liu, FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition, arXiv 2021. [Paper]
- Yuzi Yan, Xu Tan, Bohan Li, Guangyan Zhang, Tao Qin, Sheng Zhao, Yuan Shen, Wei-Qiang Zhang, Tie-Yan Liu, AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style, INTERSPEECH 2021.
- Wenxin Hou, Jindong Wang, Xu Tan, Tao Qin, Takahiro Shinozaki, Cross-domain Speech Recognition with Unsupervised Character-level Distribution Matching, INTERSPEECH 2021. [Paper]
- Yan Zhao, Weicong Chen, Xu Tan, Kai Huang, Jin Xu, Changhu Wang, Jihong Zhu, Improving Long-Tailed Classification from Instance Level, arXiv 2021. [Paper]
- Jin Xu, Xu Tan, Renqian Luo, Kaitao Song, Jian Li, Tao Qin, Tie-Yan Liu, NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search, KDD 2021. [Paper]
- Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin, Tie-Yan Liu, MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training, ACL 2021. [Paper]
- Lanqing Xue, Kaitao Song, Duocai Wu, Xu Tan,Nevin L. Zhang, Tao Qin, Wei-Qiang Zhang, Tie-Yan Liu, DeepRapper: Neural Rap Generation with Rhyme and Rhythm Modeling, ACL 2021.
- Yuzi Yan, Xu Tan, Bohan Li, Tao Qin, Sheng Zhao, Yuan Shen, Tie-Yan Liu, AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data. ICASSP 2021. [Paper]
- Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Jinzhu Li, Sheng Zhao, Enhong Chen, Tie-Yan Liu, LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search. ICASSP 2021. [Paper]
- Chen Zhang, Yi Ren, Xu Tan, Jinglin Liu, Kejun Zhang, Tao Qin, Sheng Zhao, Tie-Yan Liu, DenoiSpeech: Denoising Text to Speech with Frame-Level Noise Modeling. ICASSP 2021. [Paper]
- Linghui Meng, Jin Xu, Xu Tan, Jindong Wang, Tao Qin, Bo Xu, MixSpeech: Data Augmentation for Low-Resource Automatic Speech Recognition. ICASSP 2021. [Paper]
- Yichong Leng, Xu Tan, Sheng Zhao, Frank Soong, Xiang-Yang Li, Tao Qin, MBNet: MOS Prediction for Synthesized Speech with Mean-Bias Network, ICASSP 2021. [Paper]
- Mingjian Chen, Xu Tan, Bohan Li, Yanqing Liu, Tao Qin, Sheng Zhao, Tie-Yan Liu, AdaSpeech: Adaptive Text to Speech for Custom Voice, ICLR 2021. [Paper]
- Yi Ren, Chenxu Hu, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu, FastSpeech 2: Fast and High-Quality End-to-End Text to Speech, ICLR 2021. [Paper] [Blog]
- Zhonghao Sheng, Kaitao Song, Xu Tan, Yi Ren, Wei Ye, Shikun Zhang, Tao Qin, SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint, AAAI 2021. [Paper]
- Chen Zhang, Xu Tan, Yi Ren, Tao Qin, Kejun Zhang, Tie-Yan Liu, UWSpeech: Speech to Speech Translation for Unwritten Languages, AAAI 2021. [Paper]
- Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu, MPNet: Masked and Permuted Pre-training for Language Understanding, NeurIPS 2020. [Paper] [Blog] [Code@Github] [Code@Huggingface]
- Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Enhong Chen, Tie-Yan Liu, Semi-Supervised Neural Architecture Search, NeurIPS 2020. [Paper] [Blog] [Code@Github]
- Jiawei Chen, Xu Tan, Jian Luan, Tao Qin, Tie-Yan Liu, HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis, arXiv 2020. [Paper]
- Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Enhong Chen, and Tie-Yan Liu, Neural Architecture Search with GBDT, arXiv 2020. [Paper]
- Weicong Chen, Xu Tan, Yingce Xia, Tao Qin, Yu Wang, Tie-Yan Liu, DualLip: A System for Joint Lip Reading and Generation, ACM Multimedia 2020. [Paper]
- Yi Ren, Jinzheng He, Xu Tan, Tao Qin, Zhou Zhao, Tie-Yan Liu, PopMAG: Pop Music Accompaniment Generation, ACM Multimedia 2020. [Paper]
- Peiling Lu, Jie Wu, Jian Luan, Xu Tan, Li Zhou, XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System, INTERSPEECH 2020. [Paper]
- Mingjian Chen, Xu Tan, Yi Ren, Jin Xu, Hao Sun, Sheng Zhao, Tao Qin, Tie-Yan Liu, MultiSpeech: Multi-Speaker Text to Speech with Transformer, INTERSPEECH 2020. [Paper]
- Jin Xu, Xu Tan, Yi Ren, Tao Qin, Jian Li, Sheng Zhao, Tie-Yan Liu, LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition, KDD 2020. [Paper] [Blog]
- Yi Ren, Xu Tan, Tao Qin, Jian Luan, Zhou Zhao, Tie-Yan Liu, DeepSinger: Singing Voice Synthesis with Data Mined From the Web, KDD 2020. [Paper]
- Jinglin Liu, Yi Ren, Xu Tan, Chen Zhang, Tao Qin, Zhou Zhao, Tie-Yan Liu, Task-Level Curriculum Learning for Non-Autoregressive Neural Machine Translation, IJCAI 2020. [Paper]
- Kaitao Song, Xu Tan, Jianfeng Lu, Neural Machine Translation with Error Correction, IJCAI 2020. [Paper]
- Yi Ren, Jinglin Liu, Xu Tan, Chen Zhang, Tao QIN, Zhou Zhao and Tie-Yan Liu, SimulSpeech: End-to-End Simultaneous Speech to Text Translation, ACL 2020. [Paper]
- Yi Ren, Jinglin Liu, Xu Tan, Zhou Zhao, Sheng Zhao and Tie-Yan Liu, A Study of Non-autoregressive Model for Sequence Generation, ACL 2020. [Paper]
- Kaitao Song, Hao Sun, Xu Tan, Tao Qin, Jianfeng Lu, Hongzhi Liu, Tie-Yan Liu, LightPAFF: A Two-Stage Distillation Framework for Pre-training and Fine-tuning, arXiv 2020. [Paper]
- Jiale Chen, Xu Tan, Chaowei Shan, Sen Liu, Zhibo Chen, VESR-Net: The Winning Solution to Youku Video Enhancement and Super-Resolution Challenge, arXiv 2020. [Paper][Article]
- Tomoki Hayashi, Ryuichi Yamamoto, Katsuki Inoue, Takenori Yoshimura, Shinji Watanabe, Tomoki Toda, Kazuya Takeda, Yu Zhang, Xu Tan, ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit, ICASSP 2020. [Paper]
- Junliang Guo, Xu Tan, Linli Xu, Tao Qin, Enhong Chen, Tie-Yan Liu, Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation, AAAI 2020. [Paper]
- Hao Sun, Xu Tan, Jun-Wei Gan, Sheng Zhao, Dongxu Han, Hongzhi Liu, Tao Qin, and Tie-Yan Liu, Knowledge Distillation from BERT in Pre-training and Fine-tuning for Polyphone Disambiguation, ASRU 2019. [Paper]
- Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu, FastSpeech: Fast, Robust and Controllable Text to Speech, NeurIPS 2019. [Paper] [Demo] [Article] [Reddit]
- Xu Tan, Jiale Chen, Di He, Yingce Xia, Tao Qin and Tie-Yan Liu, Multilingual Neural Machine Translation with Language Clustering, EMNLP 2019. [Paper]
- Yingce Xia, Xu Tan, Fei Tian, Fei Gao, Weicong Chen, Yang Fan, Linyuan Gong, Yichong Leng, Renqian Luo, Yiren Wang, Lijun Wu, Jinhua Zhu, Tao Qin, Tie-Yan Liu, Microsoft Research Asia’s Systems for WMT19, WMT 2019. [Paper]
- Lijun Wu, Xu Tan, Tao Qin, Jianhuang Lai, Tie-Yan Liu, Beyond Error Propagation: Language Branching Also Affects the Accuracy of Sequence Generation, IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP) 2019. [Paper]
- Yan Lu, Yuanchao Shu, Xu Tan, Yunxin Liu, Mengyu Zhou, Qi Chen, Dan Pei, Collaborative Learning between Cloud and End Devices: An Empirical Study on Location Prediction, ACM/IEEE Symposium on Edge Computing (SEC) 2019. [Paper]
- Tianyu He, Jiale Chen, Xu Tan, Tao Qin, Language Graph Distillation for Low-Resource Machine Translation, arXiv 2019. [Paper]
- Tianyu He, Xu Tan, Tao Qin, Hard but Robust, Easy but Sensitive: How Encoder and Decoder Perform in Neural Machine Translation, arXiv 2019. [Paper]
- Xu Tan, Yichong Leng, Jiale Chen, Yi Ren, Tao Qin, Tie-Yan Liu, A Study of Multilingual Neural Machine Translation, arXiv 2019. [Paper]
- Yichong Leng, Xu Tan, Tao Qin, Xiang-Yang Li and Tie-Yan Liu, Unsupervised Pivot Translation for Distant Languages, ACL 2019. [Paper]
- Tianyu He, Yingce Xia, Jianxin Lin, Xu Tan, Di He, Tao Qin, Zhibo Chen, Deliberation Learning for Image-to-Image Translation, IJCAI 2019.
- Hao Sun, Xu Tan, Jun-Wei Gan, Hongzhi Liu, Sheng Zhao, Tao Qin, Tie-Yan Liu, Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion, INTERSPEECH 2019. [Paper]
- Yi Ren, Xu Tan, Tao Qin, Zhou Zhao, Sheng Zhao, Tie-Yan Liu, Almost Unsupervised Text to Speech and Automatic Speech Recognition, ICML 2019. [Paper] [Demo] [Article] [Blog] [Slides] [Video]
- Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu, MASS: Masked Sequence to Sequence Pre-training for Language Generation, ICML 2019. [Paper][Code@Github][Article][Blog]
- Xu Tan, Yi Ren, Di He, Tao Qin, Tie-Yan Liu, Multilingual Neural Machine Translation with Knowledge Distillation, ICLR 2019. [Paper] [Code@GitHub]
- Jun Gao, Di He, Xu Tan, Tao Qin, Liwei Wang, Tie-Yan Liu, Representation Degeneration Problem in Training Natural Language Generation Models, ICLR 2019. [Paper]
- Junliang Guo, Xu Tan, Di He, Tao Qin, and Tie-Yan Liu, Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input, AAAI 2019. [Paper]
- Chengyue Gong, Xu Tan, Di He, and Tao Qin, Sentence-wise Smooth Regularization for Sequence to Sequence Learning, AAAI 2019. [Paper]
- Yingce Xia, Tianyu He, Xu Tan, Fei Tian, Di He, and Tao Qin, Tied Transformers: Neural Machine Translation with Shared Encoder and Decoder, AAAI 2019. [Paper]
- Chengyue Gong, Di He, Xu Tan, Tao Qin, Liwei Wang, and Tie-Yan Liu, FRAGE: Frequency-Agnostic Word Representation, NIPS 2018. [Paper] [Code@Github]
- Tianyu He, Xu Tan, Yingce Xia, Di He, Tao Qin, Zhibo Chen, and Tie-Yan Liu, Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation, NIPS 2018.[Paper]
- Lijun Wu, Xu Tan, Di He, Fei Tian, Tao Qin, Jianhuang Lai, and Tie-Yan Liu, Beyond Error Propagation in Neural Machine Translation: Characteristics of Language Also Matter, EMNLP 2018. [Paper]
- Yingce Xia, Xu Tan, Fei Tian, Tao Qin, Nenghai Yu, and Tie-Yan Liu, Model-Level Dual Learning, ICML 2018. [Paper]
- Kaitao Song, Xu Tan, Furong Peng, Jianfeng Lu, Hybrid Self-Attention Network for Machine Translation, arXiv 2018. [Paper]
- Kaitao Song, Xu Tan, Di He, Jianfeng Lu, Tao Qin, and Tie-Yan Liu, Double Path Networks for Sequence to Sequence Learning, COLING 2018. [Paper] [Code@Github]
- Hany Hassan, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, Shujie Liu, Tie-Yan Liu, Renqian Luo, Arul Menezes, Tao Qin, Frank Seide, Xu Tan, Fei Tian, Lijun Wu, Shuangzhi Wu, Yingce Xia, Dongdong Zhang, Zhirui Zhang, Ming Zhou, Achieving Human Parity on Automatic Chinese to English News Translation, arXiv 2018. [Paper] [Article-1] [Article-2] [Video]
- Yanyao Shen, Xu Tan, Di He, Tao Qin, and Tie-Yan Liu, Dense Information Flow for Neural Machine Translation, NAACL 2018. [Paper] [Code@Github]