{"id":699085,"date":"2020-10-19T15:23:24","date_gmt":"2020-10-19T22:23:24","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-event&#038;p=699085"},"modified":"2025-08-06T11:52:21","modified_gmt":"2025-08-06T18:52:21","slug":"interspeech-2020","status":"publish","type":"msr-event","link":"https:\/\/www.microsoft.com\/en-us\/research\/event\/interspeech-2020\/","title":{"rendered":"Microsoft at INTERSPEECH 2020"},"content":{"rendered":"\n\n<p><strong>Website<\/strong>: <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.interspeech2020.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">INTERSPEECH 2020<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><span id=\"label-external-link\" class=\"sr-only\" aria-hidden=\"true\">Opens in a new tab<\/span><\/p>\n<p>Microsoft is proud to be a gold sponsor of <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.interspeech2020.org\/\" target=\"_blank\" rel=\"noopener\">INTERSPEECH 2020<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. See more details on our contributions on the sessions tab.<span id=\"label-external-link\" class=\"sr-only\" aria-hidden=\"true\">Opens in a new tab<\/span><\/p>\n<p><em>All times are displayed in GMT +8<\/em><\/p>\n<h2>Sunday, October 25<\/h2>\n<p>20:00 \u2013 21:30 | Tutorial B-2-1<br \/>\n<strong>Neural Approaches to Conversational Information Retrieval<\/strong><br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jfgao\/\">Jianfeng Gao<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/cxiong\/\">Chenyan Xiong<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/pauben\/\">Paul Bennett<\/a><\/p>\n<p>20:00 \u2013 21:30 | Tutorial B-3-1<br \/>\n<strong>Neural Models for Speaker Diarization in the Context of Speech Recognition<\/strong><br \/>\nKyu J. Han, Tae Jin Park, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/didimit\/\">Dimitrios Dimitriadis<\/a><\/p>\n<p>21:45 \u2013 23:15 | Tutorial B-2-2<br \/>\n<strong>Neural Approaches to Conversational Information Retrieval<\/strong><br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jfgao\/\">Jianfeng Gao<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/cxiong\/\">Chenyan Xiong<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/pauben\/\">Paul Bennett<\/a><\/p>\n<p>21:45 \u2013 23:15 | Tutorial B-3-2<br \/>\n<strong>Neural Models for Speaker Diarization in the Context of Speech Recognition<\/strong><br \/>\nKyu J. Han, Tae Jin Park, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/didimit\/\">Dimitrios Dimitriadis<\/a><\/p>\n<h2>Monday, October 26<\/h2>\n<p>19:15 \u2013 20:15 | ASR neural network architectures I<br \/>\n<strong>On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition<\/strong> (Microsoft Research Asia)<br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a>, Yu Wu, Yashesh Gaur, Chengyi Wang, Rui Zhao, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/shujliu\/\">Shujie Liu<\/a><\/p>\n<p>19:15 \u2013 20:15 | ASR neural network architectures I<br \/>\n<strong>Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers<\/strong><br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/nakanda\/\">Naoyuki Kanda<\/a>, Yashesh Gaur, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/xiaofewa\/\">Xiaofei Wang<\/a>, Zhong Meng, Zhuo Chen, Tianyan Zhou, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/tayoshio\/\">Takuya Yoshioka<\/a><\/p>\n<p>19:15 \u2013 20:15 | Multi-channel speech enhancement<br \/>\n<strong>Online directional speech enhancement using geometrically constrained independent vector analysis<\/strong><br \/>\nLi Li, Kazuhito Koishida, Shoji Makino<\/p>\n<p>19:15 \u2013 20:15 | Multi-channel speech enhancement<br \/>\n<strong>An End-to-end Architecture of Online Multi-channel Speech Separation<\/strong><br \/>\nJian Wu, Zhuo Chen, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/tayoshio\/\">Takuya Yoshioka<\/a>, Zhili Tan<\/p>\n<p>19:15 \u2013 20:15 | Speech Signal Representation<br \/>\n<strong>Robust pitch regression with voiced\/unvoiced classification in nonstationary noise environments<\/strong><br \/>\nDung Tran, Uros Batricevic, Kazuhito Koishida<\/p>\n<p>19:15 \u2013 20:15 | Speaker Diarization<br \/>\n<strong>Online Speaker Diarization with Relation Network<\/strong><br \/>\nXiang Li, Yucheng Zhao, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/cluo\/\">Chong Luo<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/wezeng\/\">Wenjun Zeng<\/a><\/p>\n<p>19:15 \u2013 20:15 | Speaker Diarization<br \/>\n<strong>Speaker attribution with voice profiles by graph-based semi-supervised learning<\/strong><br \/>\nJixuan Wang (University of Toronto), Xiong Xiao, Jian Wu, Ranjani Ramamurthy, Frank Rudzicz (University of Toronto) and Michael Brudno (University of Toronto)<\/p>\n<p>19:15 \u2013 20:15 | Noise robust and distant speech recognition<br \/>\n<strong>Neural Speech Separation Using Spatially Distributed Microphones<\/strong><br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/dowan\/\">Dongmei Wang<\/a>, Zhuo Chen and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/tayoshio\/\">Takuya Yoshioka<\/a><\/p>\n<p>20:30 \u2013 21:30 | ASR neural network architectures and training I<br \/>\n<strong>Fast and Slow Acoustic Model<\/strong><br \/>\nKshitiz Kumar, Emilian Stoimenov, Hosam Khalil, Jian Wu<\/p>\n<p>20:30 \u2013 21:30 | Evaluation of Speech Technology Systems and Methods for Resource Construction and Annotation<br \/>\n<strong>Neural Zero-Inflated Quality Estimation Model For Automatic Speech Recognition System<\/strong><br \/>\nKai Fan, Bo Li, Jiayi Wang, Shiliang Zhang, Boxing Chen, Niyu Ge, Zhi-Jie Yan<\/p>\n<p>20:30 \u2013 21:30 | ASR model training and strategies<br \/>\n<strong>Semantic Mask for Transformer based End-to-End Speech Recognition<\/strong><br \/>\nChengyi Wang, Yu Wu, Yujiao Du, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/shujliu\/\">Shujie Liu<\/a>, Liang Lu, Shuo Ren, Guoli Ye, Sheng Zhao, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/mingzhou\/\">Ming Zhou<\/a><\/p>\n<p>20:30 \u2013 21:30 | ASR model training and strategies<br \/>\n<strong>A Federated Approach in Training Acoustic Models<\/strong><br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/didimit\/\">Dimitrios Dimitriadis<\/a>, Kenichi Kumatani, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/rogmyr\/\">Robert Gmyr<\/a>, Yashesh Gaur, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/seeskime\/\">Sefik Emre Eskimez<\/a><\/p>\n<p>21:45 \u2013 22:45 | Cross\/multi-lingual and code-switched speech recognition<br \/>\n<strong>A 43 Language Multilingual Punctuation Prediction Neural Network Model<\/strong><br \/>\nXinxing Li, Edward Lin<\/p>\n<p>21:45 \u2013 22:45 | Singing Voice Computing and Processing in Music<br \/>\n<strong>Transfer Learning for Improving Singing-Voice Detection in Polyphonic Instrumental Music<\/strong><br \/>\nYuanbo Hou, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/frankkps\/\">Frank Soong<\/a>, Jian Luan, Shengchen Li<\/p>\n<p>21:45 \u2013 22:45 | Acoustic model adaptation for ASR<br \/>\n<strong>Rapid RNN-T Adaptation Using Personalized Speech Synthesis and Neural Language Generator<\/strong><br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/t-drive-trajectory-data-sample\/\">Yan Huang<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a>, Lei He, Wenning Wei, William Gale, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ygong\/\">Yifan Gong<\/a><\/p>\n<p>21:45 \u2013 22:45 | Singing and Multimodal Synthesis<br \/>\n<strong>Adversarially Trained Multi-Singer Sequence-To-Sequence Singing Synthesizer<\/strong><br \/>\nJie Wu, Jian Luan<\/p>\n<p>21:45 \u2013 22:45 | Singing and Multimodal Synthesis<br \/>\n<strong>XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System<\/strong><br \/>\nPeiling Lu, Jie Wu, Jian Luan, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/query-understanding-through-knowledge-based-conceptualization\/\">Xu Tan<\/a>, Li Zhou<\/p>\n<p>21:45 \u2013 22:45 | Student Events<br \/>\n<strong>ISCA-SAC: 2nd Mentoring Event<\/strong><br \/>\nMentor: <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a><\/p>\n<h2>Tuesday, October 27<\/h2>\n<p>19:15 \u2013 20:15 | Feature extraction and distant ASR<br \/>\n<strong>Bandpass Noise Generation and Augmentation for Unified ASR<\/strong><br \/>\nKshitiz Kumar, Bo Ren, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ygong\/\">Yifan Gong<\/a>, Jian Wu<\/p>\n<p>19:15 \u2013 20:15 | Search for speech recognition<br \/>\n<strong>Combination of end-to-end and hybrid models for speech recognition<\/strong><br \/>\nJeremy Heng Meng Wong, Yashesh Gaur, Rui Zhao, Liang Lu, Eric Sun, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ygong\/\">Yifan Gong<\/a><\/p>\n<h2>Wednesday, October 28<\/h2>\n<p>19:15 \u2013 20:15 | Streaming ASR<br \/>\n<strong>1-D Row-Convolution LSTM: Fast Streaming ASR at Accuracy Parity with LC-BLSTM<\/strong><br \/>\nKshitiz Kumar, Chaojun Liu, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ygong\/\">Yifan Gong<\/a>, Jian Wu<\/p>\n<p>19:15 \u2013 20:15 | Streaming ASR<br \/>\n<strong>Low Latency End-to-End Streaming Speech Recognition with a Scout Network<\/strong><br \/>\nChengyi Wang, Yu Wu, Liang Lu, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/shujliu\/\">Shujie Liu<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a>, Guoli Ye, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/mingzhou\/\">Ming Zhou<\/a><\/p>\n<p>19:15 \u2013 20:15 | Streaming ASR<br \/>\n<strong>Transfer Learning Approaches for Streaming End-to-End Speech Recognition System<\/strong><br \/>\nVikas Joshi, Rui Zhao, Rupesh Mehta, Kshitiz Kumar, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a><\/p>\n<p>19:15 \u2013 20:15 | Applications of ASR<br \/>\n<strong>SpecMark: A Spectral Watermarking Framework for IP Protection of Speech Recognition Systems<\/strong><br \/>\nHuili Chen, Bita Darvish Rouhani, Farinaz Koushanfar<\/p>\n<p>19:15 \u2013 20:15 | Single-channel speech enhancement I<br \/>\n<strong>Low-Latency Single Channel Speech Dereverberation using U-Net Convolutional Neural Networks<\/strong><br \/>\nAhmet E. Bulut, Kazuhito Koishida<\/p>\n<p>19:15 \u2013 20:15 | Single-channel speech enhancement I<br \/>\n<strong>Single-channel speech enhancement by subspace affinity minimization<\/strong><br \/>\nDung Tran, Kazuhito Koishida<\/p>\n<p>19:15 \u2013 20:15 | Deep Noise Suppression Challenge<br \/>\n<strong>The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results<\/strong><br \/>\nChandan Karadagur, Ananda Reddy, Vishak Gopal, Ross Cutler, Ebrahim Beyrami, Roger Cheng, Harishchandra Dubey, Sergiy Matusevych, Robert Aichner, Ashkan Aazami, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/sebraun\/\">Sebastian Braun<\/a>, Puneet Rana, Sriram Srinivasan, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/chiw\/\">Johannes Gehrke<\/a><\/p>\n<p>20:30 \u2013 21:30 | Spoken Term Detection<br \/>\n<strong>Re-weighted Interval Loss for Handling Data Imbalance Problem of End-to-End Keyword Spotting<\/strong><br \/>\nKun Zhang, Zhiyong Wu, Daode Yuan, Jian Luan, Jia Jia, Helen Meng, Binheng Song<\/p>\n<p>20:30 \u2013 21:30 | Training strategies for ASR<br \/>\n<strong>Serialized Output Training for End-to-End Overlapped Speech Recognition<\/strong><br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/nakanda\/\">Naoyuki Kanda<\/a>, Yashesh Gaur, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/xiaofewa\/\">Xiaofei Wang<\/a>, Zhong Meng, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/tayoshio\/\">Takuya Yoshioka<\/a><\/p>\n<p>20:30 \u2013 21:30 | Speech transmission & coding<br \/>\n<strong>An Open source Implementation of ITU-T Recommendation P.808 with Validation<\/strong><br \/>\nBabak Naderi, Ross Cutler<\/p>\n<p>20:30 \u2013 21:30 | Speech transmission & coding<br \/>\n<strong>DNN No-Reference PSTN Speech Quality Prediction<\/strong><br \/>\nGabriel Mittag, Ross Cutler, Yasaman Hosseinkashi, Michael Revow, Sriram Srinivasan, Naglakshmi Chande, Robert Aichner<\/p>\n<p>20:30 \u2013 21:30 | Speech Synthesis: Multilingual and Cross-lingual approaches<br \/>\n<strong>On Improving Code Mixed Speech Synthesis with Mixlingual Grapheme-to-Phoneme Model<\/strong><br \/>\nShubham Bansal, Arijit Mukherjee, Sandeepkumar Satpal, Rupesh Mehta<\/p>\n<p>21:45 \u2013 22:45 | Speech Synthesis Paradigms and Methods II<br \/>\n<strong>Towards Universal Text-to-Speech<\/strong><br \/>\nJingzhou Yang, Lei He<\/p>\n<p>21:45 \u2013 22:45 | Speech Synthesis Paradigms and Methods II<br \/>\n<strong>Enhancing Monotonicity for Robust Autoregressive Transformer TTS<\/strong><br \/>\nXiangyu Liang, Zhiyong Wu, Runnan Li, Yanqing Liu, Sheng Zhao<\/p>\n<p>21:45 \u2013 22:45 | Speech Synthesis: Prosody and Emotion<br \/>\n<strong>Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis<\/strong><br \/>\nYukiya Hono, Kazuna Tsuboi, Kei Sawada, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda<\/p>\n<p>21:45 \u2013 22:45 | Speech Synthesis: Prosody and Emotion<br \/>\n<strong>GAN-based Data Generation for Speech Emotion Recognition<\/strong><br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/seeskime\/\">Sefik Emre Eskimez<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/didimit\/\">Dimitrios Dimitriadis<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/rogmyr\/\">Robert Gmyr<\/a>, Kenichi Kumatani<\/p>\n<p>21:45 \u2013 22:45 | Student Events<br \/>\n<strong>ISCA-SAC: 7th Students Meet the Experts<\/strong><br \/>\nPanelist: <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/event\/interspeech-2018-special-session-low-resource-speech-recognition-challenge-indian-languages\/\">Sunayana Sitaram<\/a><\/p>\n<h2>Thursday, October 29<\/h2>\n<p>19:15 \u2013 20:15 | Speech Synthesis: Neural Waveform Generation II<br \/>\n<strong>An Efficient Subband Linear Prediction for LPCNet-based Neural Synthesis<\/strong><br \/>\nYang Cui, Xi Wang, Lei He, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/frankkps\/\">Frank Soong<\/a><\/p>\n<p>19:15 \u2013 20:15 | ASR neural network architectures and training II<br \/>\n<strong>Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability<\/strong><br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a>, Rui Zhao, Zhong Meng, Yanqing Liu, Wenning Wei, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/sarangp\/\">Sarangarajan Parthasarathy<\/a>, Vadim Mazalov, Zhenghao Wang, Lei He, Sheng Zhao, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ygong\/\">Yifan Gong<\/a><\/p>\n<p>19:15 \u2013 20:15 | New Trends in self-supervised speech processing<br \/>\n<strong>Sequence-level Self-learning with Multiple Hypotheses<\/strong><br \/>\nKenichi Kumatani, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/didimit\/\">Dimitrios Dimitriadis<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/rogmyr\/\">Robert Gmyr<\/a>, Yashesh Gaur, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/seeskime\/\">Sefik Emre Eskimez<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/nzeng\/\">Michael Zeng<\/a><\/p>\n<p>19:15 \u2013 20:15 | Spoken Dialogue System<br \/>\n<strong>Discriminative Transfer Learning for Optimizing ASR and Semantic Labeling in Task-oriented Spoken Dialog<\/strong><br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/yaoqian\/\">Yao Qian<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/yushi\/\">Yu Shi<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/nzeng\/\">Michael Zeng<\/a><\/p>\n<p>19:15 \u2013 20:15 | Spoken Dialogue System<br \/>\n<strong>Datasets and Benchmarks for Task-Oriented Log Dialogue Ranking Task<\/strong><br \/>\nXinnuo Xu, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/yizzhang\/\">Yizhe Zhang<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/laliden\/\">Lars Liden<\/a>, Sungjin Lee<\/p>\n<p>19:15 \u2013 20:15 | Speech Synthesis: Toward End-to-End Synthesis<br \/>\n<strong>MoBoAligner: a Neural Alignment Model for Non-autoregressive TTS with Monotonic Boundary Search<\/strong><br \/>\nNaihan Li, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/shujliu\/\">Shujie Liu<\/a>, Yanqing Liu, Sheng Zhao, Ming Liu<\/p>\n<p>19:15 \u2013 20:15 | Speech Synthesis: Toward End-to-End Synthesis<br \/>\n<strong>MultiSpeech: Multi-Speaker Text to Speech with Transformer<\/strong><br \/>\nMingjian Chen, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/xuta\/\">Xu Tan<\/a>, Yi Ren, Jin Xu, Hao Sun, Sheng Zhao, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/taoqin\/\">Tao Qin<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/tyliu\/\">Tie-Yan Liu<\/a><\/p>\n<p>20:30 \u2013 21:30 | Speech Synthesis: Prosody Modeling<br \/>\n<strong>Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency<\/strong><br \/>\nMatt Whitehill, Shuang Ma, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/damcduff\/\">Daniel McDuff<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/yalesong\/\">Yale Song<\/a><\/p>\n<p>21:45 \u2013 22:45 | Multilingual and code-switched ASR<br \/>\n<strong>Improving Low Resource Code-switched ASR using Augmented Code-switched TTS<\/strong><br \/>\nYash Sharma, Basil Abraham, Karan Taneja, Preethi Jyothi<\/p>\n<p>21:45 \u2013 22:45 | ASR neural network architectures II \u2013 Transformers<br \/>\n<strong>Exploring Transformers for Large-Scale Speech Recognition<\/strong><br \/>\nLiang Lu, Changliang Liu, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ygong\/\">Yifan Gong<\/a><span id=\"label-external-link\" class=\"sr-only\" aria-hidden=\"true\">Opens in a new tab<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Microsoft is proud to be a gold sponsor of INTERSPEECH 2020.<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr_startdate":"2020-10-25","msr_enddate":"2020-10-29","msr_location":"Virtual","msr_expirationdate":"","msr_event_recording_link":"","msr_event_link":"","msr_event_link_redirect":false,"msr_event_time":"","msr_hide_region":false,"msr_private_event":false,"msr_hide_image_in_river":0,"footnotes":""},"research-area":[13545],"msr-region":[256048],"msr-event-type":[197941],"msr-video-type":[],"msr-locale":[268875],"msr-program-audience":[],"msr-post-option":[],"msr-impact-theme":[],"class_list":["post-699085","msr-event","type-msr-event","status-publish","hentry","msr-research-area-human-language-technologies","msr-region-global","msr-event-type-conferences","msr-locale-en_us"],"msr_about":"<!-- wp:msr\/event-details {\"title\":\"Microsoft at INTERSPEECH 2020\",\"backgroundColor\":\"grey\"} \/-->\n\n<!-- wp:msr\/content-tabs --><!-- wp:msr\/content-tab {\"title\":\"About\"} --><!-- wp:freeform --><p><strong>Website<\/strong>: <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.interspeech2020.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">INTERSPEECH 2020<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><span id=\"label-external-link\" class=\"sr-only\" aria-hidden=\"true\">Opens in a new tab<\/span><\/p>\n<p>Microsoft is proud to be a gold sponsor of <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/www.interspeech2020.org\/\" target=\"_blank\" rel=\"noopener\">INTERSPEECH 2020<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. See more details on our contributions on the sessions tab.<span id=\"label-external-link\" class=\"sr-only\" aria-hidden=\"true\">Opens in a new tab<\/span><\/p>\n<!-- \/wp:freeform --><!-- \/wp:msr\/content-tab --><!-- wp:msr\/content-tab {\"title\":\"Sessions\"} --><!-- wp:freeform --><p><em>All times are displayed in GMT +8<\/em><\/p>\n<h2>Sunday, October 25<\/h2>\n<p>20:00 \u2013 21:30 | Tutorial B-2-1<br \/>\n<strong>Neural Approaches to Conversational Information Retrieval<\/strong><br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jfgao\/\">Jianfeng Gao<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/cxiong\/\">Chenyan Xiong<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/pauben\/\">Paul Bennett<\/a><\/p>\n<p>20:00 \u2013 21:30 | Tutorial B-3-1<br \/>\n<strong>Neural Models for Speaker Diarization in the Context of Speech Recognition<\/strong><br \/>\nKyu J. Han, Tae Jin Park, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/didimit\/\">Dimitrios Dimitriadis<\/a><\/p>\n<p>21:45 \u2013 23:15 | Tutorial B-2-2<br \/>\n<strong>Neural Approaches to Conversational Information Retrieval<\/strong><br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jfgao\/\">Jianfeng Gao<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/cxiong\/\">Chenyan Xiong<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/pauben\/\">Paul Bennett<\/a><\/p>\n<p>21:45 \u2013 23:15 | Tutorial B-3-2<br \/>\n<strong>Neural Models for Speaker Diarization in the Context of Speech Recognition<\/strong><br \/>\nKyu J. Han, Tae Jin Park, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/didimit\/\">Dimitrios Dimitriadis<\/a><\/p>\n<h2>Monday, October 26<\/h2>\n<p>19:15 \u2013 20:15 | ASR neural network architectures I<br \/>\n<strong>On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition<\/strong> (Microsoft Research Asia)<br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a>, Yu Wu, Yashesh Gaur, Chengyi Wang, Rui Zhao, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/shujliu\/\">Shujie Liu<\/a><\/p>\n<p>19:15 \u2013 20:15 | ASR neural network architectures I<br \/>\n<strong>Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers<\/strong><br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/nakanda\/\">Naoyuki Kanda<\/a>, Yashesh Gaur, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/xiaofewa\/\">Xiaofei Wang<\/a>, Zhong Meng, Zhuo Chen, Tianyan Zhou, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/tayoshio\/\">Takuya Yoshioka<\/a><\/p>\n<p>19:15 \u2013 20:15 | Multi-channel speech enhancement<br \/>\n<strong>Online directional speech enhancement using geometrically constrained independent vector analysis<\/strong><br \/>\nLi Li, Kazuhito Koishida, Shoji Makino<\/p>\n<p>19:15 \u2013 20:15 | Multi-channel speech enhancement<br \/>\n<strong>An End-to-end Architecture of Online Multi-channel Speech Separation<\/strong><br \/>\nJian Wu, Zhuo Chen, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/tayoshio\/\">Takuya Yoshioka<\/a>, Zhili Tan<\/p>\n<p>19:15 \u2013 20:15 | Speech Signal Representation<br \/>\n<strong>Robust pitch regression with voiced\/unvoiced classification in nonstationary noise environments<\/strong><br \/>\nDung Tran, Uros Batricevic, Kazuhito Koishida<\/p>\n<p>19:15 \u2013 20:15 | Speaker Diarization<br \/>\n<strong>Online Speaker Diarization with Relation Network<\/strong><br \/>\nXiang Li, Yucheng Zhao, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/cluo\/\">Chong Luo<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/wezeng\/\">Wenjun Zeng<\/a><\/p>\n<p>19:15 \u2013 20:15 | Speaker Diarization<br \/>\n<strong>Speaker attribution with voice profiles by graph-based semi-supervised learning<\/strong><br \/>\nJixuan Wang (University of Toronto), Xiong Xiao, Jian Wu, Ranjani Ramamurthy, Frank Rudzicz (University of Toronto) and Michael Brudno (University of Toronto)<\/p>\n<p>19:15 \u2013 20:15 | Noise robust and distant speech recognition<br \/>\n<strong>Neural Speech Separation Using Spatially Distributed Microphones<\/strong><br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/dowan\/\">Dongmei Wang<\/a>, Zhuo Chen and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/tayoshio\/\">Takuya Yoshioka<\/a><\/p>\n<p>20:30 \u2013 21:30 | ASR neural network architectures and training I<br \/>\n<strong>Fast and Slow Acoustic Model<\/strong><br \/>\nKshitiz Kumar, Emilian Stoimenov, Hosam Khalil, Jian Wu<\/p>\n<p>20:30 \u2013 21:30 | Evaluation of Speech Technology Systems and Methods for Resource Construction and Annotation<br \/>\n<strong>Neural Zero-Inflated Quality Estimation Model For Automatic Speech Recognition System<\/strong><br \/>\nKai Fan, Bo Li, Jiayi Wang, Shiliang Zhang, Boxing Chen, Niyu Ge, Zhi-Jie Yan<\/p>\n<p>20:30 \u2013 21:30 | ASR model training and strategies<br \/>\n<strong>Semantic Mask for Transformer based End-to-End Speech Recognition<\/strong><br \/>\nChengyi Wang, Yu Wu, Yujiao Du, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/shujliu\/\">Shujie Liu<\/a>, Liang Lu, Shuo Ren, Guoli Ye, Sheng Zhao, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/mingzhou\/\">Ming Zhou<\/a><\/p>\n<p>20:30 \u2013 21:30 | ASR model training and strategies<br \/>\n<strong>A Federated Approach in Training Acoustic Models<\/strong><br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/didimit\/\">Dimitrios Dimitriadis<\/a>, Kenichi Kumatani, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/rogmyr\/\">Robert Gmyr<\/a>, Yashesh Gaur, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/seeskime\/\">Sefik Emre Eskimez<\/a><\/p>\n<p>21:45 \u2013 22:45 | Cross\/multi-lingual and code-switched speech recognition<br \/>\n<strong>A 43 Language Multilingual Punctuation Prediction Neural Network Model<\/strong><br \/>\nXinxing Li, Edward Lin<\/p>\n<p>21:45 \u2013 22:45 | Singing Voice Computing and Processing in Music<br \/>\n<strong>Transfer Learning for Improving Singing-Voice Detection in Polyphonic Instrumental Music<\/strong><br \/>\nYuanbo Hou, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/frankkps\/\">Frank Soong<\/a>, Jian Luan, Shengchen Li<\/p>\n<p>21:45 \u2013 22:45 | Acoustic model adaptation for ASR<br \/>\n<strong>Rapid RNN-T Adaptation Using Personalized Speech Synthesis and Neural Language Generator<\/strong><br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/t-drive-trajectory-data-sample\/\">Yan Huang<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a>, Lei He, Wenning Wei, William Gale, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ygong\/\">Yifan Gong<\/a><\/p>\n<p>21:45 \u2013 22:45 | Singing and Multimodal Synthesis<br \/>\n<strong>Adversarially Trained Multi-Singer Sequence-To-Sequence Singing Synthesizer<\/strong><br \/>\nJie Wu, Jian Luan<\/p>\n<p>21:45 \u2013 22:45 | Singing and Multimodal Synthesis<br \/>\n<strong>XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System<\/strong><br \/>\nPeiling Lu, Jie Wu, Jian Luan, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/query-understanding-through-knowledge-based-conceptualization\/\">Xu Tan<\/a>, Li Zhou<\/p>\n<p>21:45 \u2013 22:45 | Student Events<br \/>\n<strong>ISCA-SAC: 2nd Mentoring Event<\/strong><br \/>\nMentor: <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a><\/p>\n<h2>Tuesday, October 27<\/h2>\n<p>19:15 \u2013 20:15 | Feature extraction and distant ASR<br \/>\n<strong>Bandpass Noise Generation and Augmentation for Unified ASR<\/strong><br \/>\nKshitiz Kumar, Bo Ren, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ygong\/\">Yifan Gong<\/a>, Jian Wu<\/p>\n<p>19:15 \u2013 20:15 | Search for speech recognition<br \/>\n<strong>Combination of end-to-end and hybrid models for speech recognition<\/strong><br \/>\nJeremy Heng Meng Wong, Yashesh Gaur, Rui Zhao, Liang Lu, Eric Sun, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ygong\/\">Yifan Gong<\/a><\/p>\n<h2>Wednesday, October 28<\/h2>\n<p>19:15 \u2013 20:15 | Streaming ASR<br \/>\n<strong>1-D Row-Convolution LSTM: Fast Streaming ASR at Accuracy Parity with LC-BLSTM<\/strong><br \/>\nKshitiz Kumar, Chaojun Liu, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ygong\/\">Yifan Gong<\/a>, Jian Wu<\/p>\n<p>19:15 \u2013 20:15 | Streaming ASR<br \/>\n<strong>Low Latency End-to-End Streaming Speech Recognition with a Scout Network<\/strong><br \/>\nChengyi Wang, Yu Wu, Liang Lu, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/shujliu\/\">Shujie Liu<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a>, Guoli Ye, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/mingzhou\/\">Ming Zhou<\/a><\/p>\n<p>19:15 \u2013 20:15 | Streaming ASR<br \/>\n<strong>Transfer Learning Approaches for Streaming End-to-End Speech Recognition System<\/strong><br \/>\nVikas Joshi, Rui Zhao, Rupesh Mehta, Kshitiz Kumar, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a><\/p>\n<p>19:15 \u2013 20:15 | Applications of ASR<br \/>\n<strong>SpecMark: A Spectral Watermarking Framework for IP Protection of Speech Recognition Systems<\/strong><br \/>\nHuili Chen, Bita Darvish Rouhani, Farinaz Koushanfar<\/p>\n<p>19:15 \u2013 20:15 | Single-channel speech enhancement I<br \/>\n<strong>Low-Latency Single Channel Speech Dereverberation using U-Net Convolutional Neural Networks<\/strong><br \/>\nAhmet E. Bulut, Kazuhito Koishida<\/p>\n<p>19:15 \u2013 20:15 | Single-channel speech enhancement I<br \/>\n<strong>Single-channel speech enhancement by subspace affinity minimization<\/strong><br \/>\nDung Tran, Kazuhito Koishida<\/p>\n<p>19:15 \u2013 20:15 | Deep Noise Suppression Challenge<br \/>\n<strong>The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results<\/strong><br \/>\nChandan Karadagur, Ananda Reddy, Vishak Gopal, Ross Cutler, Ebrahim Beyrami, Roger Cheng, Harishchandra Dubey, Sergiy Matusevych, Robert Aichner, Ashkan Aazami, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/sebraun\/\">Sebastian Braun<\/a>, Puneet Rana, Sriram Srinivasan, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/chiw\/\">Johannes Gehrke<\/a><\/p>\n<p>20:30 \u2013 21:30 | Spoken Term Detection<br \/>\n<strong>Re-weighted Interval Loss for Handling Data Imbalance Problem of End-to-End Keyword Spotting<\/strong><br \/>\nKun Zhang, Zhiyong Wu, Daode Yuan, Jian Luan, Jia Jia, Helen Meng, Binheng Song<\/p>\n<p>20:30 \u2013 21:30 | Training strategies for ASR<br \/>\n<strong>Serialized Output Training for End-to-End Overlapped Speech Recognition<\/strong><br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/nakanda\/\">Naoyuki Kanda<\/a>, Yashesh Gaur, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/xiaofewa\/\">Xiaofei Wang<\/a>, Zhong Meng, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/tayoshio\/\">Takuya Yoshioka<\/a><\/p>\n<p>20:30 \u2013 21:30 | Speech transmission &amp; coding<br \/>\n<strong>An Open source Implementation of ITU-T Recommendation P.808 with Validation<\/strong><br \/>\nBabak Naderi, Ross Cutler<\/p>\n<p>20:30 \u2013 21:30 | Speech transmission &amp; coding<br \/>\n<strong>DNN No-Reference PSTN Speech Quality Prediction<\/strong><br \/>\nGabriel Mittag, Ross Cutler, Yasaman Hosseinkashi, Michael Revow, Sriram Srinivasan, Naglakshmi Chande, Robert Aichner<\/p>\n<p>20:30 \u2013 21:30 | Speech Synthesis: Multilingual and Cross-lingual approaches<br \/>\n<strong>On Improving Code Mixed Speech Synthesis with Mixlingual Grapheme-to-Phoneme Model<\/strong><br \/>\nShubham Bansal, Arijit Mukherjee, Sandeepkumar Satpal, Rupesh Mehta<\/p>\n<p>21:45 \u2013 22:45 | Speech Synthesis Paradigms and Methods II<br \/>\n<strong>Towards Universal Text-to-Speech<\/strong><br \/>\nJingzhou Yang, Lei He<\/p>\n<p>21:45 \u2013 22:45 | Speech Synthesis Paradigms and Methods II<br \/>\n<strong>Enhancing Monotonicity for Robust Autoregressive Transformer TTS<\/strong><br \/>\nXiangyu Liang, Zhiyong Wu, Runnan Li, Yanqing Liu, Sheng Zhao<\/p>\n<p>21:45 \u2013 22:45 | Speech Synthesis: Prosody and Emotion<br \/>\n<strong>Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis<\/strong><br \/>\nYukiya Hono, Kazuna Tsuboi, Kei Sawada, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda<\/p>\n<p>21:45 \u2013 22:45 | Speech Synthesis: Prosody and Emotion<br \/>\n<strong>GAN-based Data Generation for Speech Emotion Recognition<\/strong><br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/seeskime\/\">Sefik Emre Eskimez<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/didimit\/\">Dimitrios Dimitriadis<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/rogmyr\/\">Robert Gmyr<\/a>, Kenichi Kumatani<\/p>\n<p>21:45 \u2013 22:45 | Student Events<br \/>\n<strong>ISCA-SAC: 7th Students Meet the Experts<\/strong><br \/>\nPanelist: <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/event\/interspeech-2018-special-session-low-resource-speech-recognition-challenge-indian-languages\/\">Sunayana Sitaram<\/a><\/p>\n<h2>Thursday, October 29<\/h2>\n<p>19:15 \u2013 20:15 | Speech Synthesis: Neural Waveform Generation II<br \/>\n<strong>An Efficient Subband Linear Prediction for LPCNet-based Neural Synthesis<\/strong><br \/>\nYang Cui, Xi Wang, Lei He, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/frankkps\/\">Frank Soong<\/a><\/p>\n<p>19:15 \u2013 20:15 | ASR neural network architectures and training II<br \/>\n<strong>Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability<\/strong><br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a>, Rui Zhao, Zhong Meng, Yanqing Liu, Wenning Wei, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/sarangp\/\">Sarangarajan Parthasarathy<\/a>, Vadim Mazalov, Zhenghao Wang, Lei He, Sheng Zhao, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ygong\/\">Yifan Gong<\/a><\/p>\n<p>19:15 \u2013 20:15 | New Trends in self-supervised speech processing<br \/>\n<strong>Sequence-level Self-learning with Multiple Hypotheses<\/strong><br \/>\nKenichi Kumatani, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/didimit\/\">Dimitrios Dimitriadis<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/rogmyr\/\">Robert Gmyr<\/a>, Yashesh Gaur, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/seeskime\/\">Sefik Emre Eskimez<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/nzeng\/\">Michael Zeng<\/a><\/p>\n<p>19:15 \u2013 20:15 | Spoken Dialogue System<br \/>\n<strong>Discriminative Transfer Learning for Optimizing ASR and Semantic Labeling in Task-oriented Spoken Dialog<\/strong><br \/>\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/yaoqian\/\">Yao Qian<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/yushi\/\">Yu Shi<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/nzeng\/\">Michael Zeng<\/a><\/p>\n<p>19:15 \u2013 20:15 | Spoken Dialogue System<br \/>\n<strong>Datasets and Benchmarks for Task-Oriented Log Dialogue Ranking Task<\/strong><br \/>\nXinnuo Xu, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/yizzhang\/\">Yizhe Zhang<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/laliden\/\">Lars Liden<\/a>, Sungjin Lee<\/p>\n<p>19:15 \u2013 20:15 | Speech Synthesis: Toward End-to-End Synthesis<br \/>\n<strong>MoBoAligner: a Neural Alignment Model for Non-autoregressive TTS with Monotonic Boundary Search<\/strong><br \/>\nNaihan Li, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/shujliu\/\">Shujie Liu<\/a>, Yanqing Liu, Sheng Zhao, Ming Liu<\/p>\n<p>19:15 \u2013 20:15 | Speech Synthesis: Toward End-to-End Synthesis<br \/>\n<strong>MultiSpeech: Multi-Speaker Text to Speech with Transformer<\/strong><br \/>\nMingjian Chen, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/xuta\/\">Xu Tan<\/a>, Yi Ren, Jin Xu, Hao Sun, Sheng Zhao, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/taoqin\/\">Tao Qin<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/tyliu\/\">Tie-Yan Liu<\/a><\/p>\n<p>20:30 \u2013 21:30 | Speech Synthesis: Prosody Modeling<br \/>\n<strong>Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency<\/strong><br \/>\nMatt Whitehill, Shuang Ma, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/damcduff\/\">Daniel McDuff<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/yalesong\/\">Yale Song<\/a><\/p>\n<p>21:45 \u2013 22:45 | Multilingual and code-switched ASR<br \/>\n<strong>Improving Low Resource Code-switched ASR using Augmented Code-switched TTS<\/strong><br \/>\nYash Sharma, Basil Abraham, Karan Taneja, Preethi Jyothi<\/p>\n<p>21:45 \u2013 22:45 | ASR neural network architectures II \u2013 Transformers<br \/>\n<strong>Exploring Transformers for Large-Scale Speech Recognition<\/strong><br \/>\nLiang Lu, Changliang Liu, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ygong\/\">Yifan Gong<\/a><span id=\"label-external-link\" class=\"sr-only\" aria-hidden=\"true\">Opens in a new tab<\/span><\/p>\n<!-- \/wp:freeform --><!-- \/wp:msr\/content-tab --><!-- \/wp:msr\/content-tabs -->","tab-content":[{"id":0,"name":"About","content":"Microsoft is proud to be a gold sponsor of <a href=\"http:\/\/www.interspeech2020.org\/\" target=\"_blank\" rel=\"noopener\">INTERSPEECH 2020<\/a>. See more details on our contributions on the sessions tab."},{"id":1,"name":"Sessions","content":"<em>All times are displayed in GMT +8<\/em>\r\n<h2>Sunday, October 25<\/h2>\r\n20:00 \u2013 21:30 | Tutorial B-2-1\r\n<strong>Neural Approaches to Conversational Information Retrieval<\/strong>\r\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jfgao\/\">Jianfeng Gao<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/cxiong\/\">Chenyan Xiong<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/pauben\/\">Paul Bennett<\/a>\r\n\r\n20:00 \u2013 21:30 | Tutorial B-3-1\r\n<strong>Neural Models for Speaker Diarization in the Context of Speech Recognition<\/strong>\r\nKyu J. Han, Tae Jin Park, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/didimit\/\">Dimitrios Dimitriadis<\/a>\r\n\r\n21:45 \u2013 23:15 | Tutorial B-2-2\r\n<strong>Neural Approaches to Conversational Information Retrieval<\/strong>\r\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jfgao\/\">Jianfeng Gao<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/cxiong\/\">Chenyan Xiong<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/pauben\/\">Paul Bennett<\/a>\r\n\r\n21:45 \u2013 23:15 | Tutorial B-3-2\r\n<strong>Neural Models for Speaker Diarization in the Context of Speech Recognition<\/strong>\r\nKyu J. Han, Tae Jin Park, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/didimit\/\">Dimitrios Dimitriadis<\/a>\r\n<h2>Monday, October 26<\/h2>\r\n19:15 \u2013 20:15 | ASR neural network architectures I\r\n<strong>On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition<\/strong> (Microsoft Research Asia)\r\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a>, Yu Wu, Yashesh Gaur, Chengyi Wang, Rui Zhao, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/shujliu\/\">Shujie Liu<\/a>\r\n\r\n19:15 \u2013 20:15 | ASR neural network architectures I\r\n<strong>Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers<\/strong>\r\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/nakanda\/\">Naoyuki Kanda<\/a>, Yashesh Gaur, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/xiaofewa\/\">Xiaofei Wang<\/a>, Zhong Meng, Zhuo Chen, Tianyan Zhou, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/tayoshio\/\">Takuya Yoshioka<\/a>\r\n\r\n19:15 \u2013 20:15 | Multi-channel speech enhancement\r\n<strong>Online directional speech enhancement using geometrically constrained independent vector analysis<\/strong>\r\nLi Li, Kazuhito Koishida, Shoji Makino\r\n\r\n19:15 \u2013 20:15 | Multi-channel speech enhancement\r\n<strong>An End-to-end Architecture of Online Multi-channel Speech Separation<\/strong>\r\nJian Wu, Zhuo Chen, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/tayoshio\/\">Takuya Yoshioka<\/a>, Zhili Tan\r\n\r\n19:15 \u2013 20:15 | Speech Signal Representation\r\n<strong>Robust pitch regression with voiced\/unvoiced classification in nonstationary noise environments<\/strong>\r\nDung Tran, Uros Batricevic, Kazuhito Koishida\r\n\r\n19:15 \u2013 20:15 | Speaker Diarization\r\n<strong>Online Speaker Diarization with Relation Network<\/strong>\r\nXiang Li, Yucheng Zhao, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/cluo\/\">Chong Luo<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/wezeng\/\">Wenjun Zeng<\/a>\r\n\r\n19:15 \u2013 20:15 | Speaker Diarization\r\n<strong>Speaker attribution with voice profiles by graph-based semi-supervised learning<\/strong>\r\nJixuan Wang (University of Toronto), Xiong Xiao, Jian Wu, Ranjani Ramamurthy, Frank Rudzicz (University of Toronto) and Michael Brudno (University of Toronto)\r\n\r\n19:15 \u2013 20:15 | Noise robust and distant speech recognition\r\n<strong>Neural Speech Separation Using Spatially Distributed Microphones<\/strong>\r\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/dowan\/\">Dongmei Wang<\/a>, Zhuo Chen and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/tayoshio\/\">Takuya Yoshioka<\/a>\r\n\r\n20:30 \u2013 21:30 | ASR neural network architectures and training I\r\n<strong>Fast and Slow Acoustic Model<\/strong>\r\nKshitiz Kumar, Emilian Stoimenov, Hosam Khalil, Jian Wu\r\n\r\n20:30 \u2013 21:30 | Evaluation of Speech Technology Systems and Methods for Resource Construction and Annotation\r\n<strong>Neural Zero-Inflated Quality Estimation Model For Automatic Speech Recognition System<\/strong>\r\nKai Fan, Bo Li, Jiayi Wang, Shiliang Zhang, Boxing Chen, Niyu Ge, Zhi-Jie Yan\r\n\r\n20:30 \u2013 21:30 | ASR model training and strategies\r\n<strong>Semantic Mask for Transformer based End-to-End Speech Recognition<\/strong>\r\nChengyi Wang, Yu Wu, Yujiao Du, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/shujliu\/\">Shujie Liu<\/a>, Liang Lu, Shuo Ren, Guoli Ye, Sheng Zhao, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/mingzhou\/\">Ming Zhou<\/a>\r\n\r\n20:30 \u2013 21:30 | ASR model training and strategies\r\n<strong>A Federated Approach in Training Acoustic Models<\/strong>\r\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/didimit\/\">Dimitrios Dimitriadis<\/a>, Kenichi Kumatani, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/rogmyr\/\">Robert Gmyr<\/a>, Yashesh Gaur, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/seeskime\/\">Sefik Emre Eskimez<\/a>\r\n\r\n21:45 \u2013 22:45 | Cross\/multi-lingual and code-switched speech recognition\r\n<strong>A 43 Language Multilingual Punctuation Prediction Neural Network Model<\/strong>\r\nXinxing Li, Edward Lin\r\n\r\n21:45 \u2013 22:45 | Singing Voice Computing and Processing in Music\r\n<strong>Transfer Learning for Improving Singing-Voice Detection in Polyphonic Instrumental Music<\/strong>\r\nYuanbo Hou, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/frankkps\/\">Frank Soong<\/a>, Jian Luan, Shengchen Li\r\n\r\n21:45 \u2013 22:45 | Acoustic model adaptation for ASR\r\n<strong>Rapid RNN-T Adaptation Using Personalized Speech Synthesis and Neural Language Generator<\/strong>\r\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/t-drive-trajectory-data-sample\/\">Yan Huang<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a>, Lei He, Wenning Wei, William Gale, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ygong\/\">Yifan Gong<\/a>\r\n\r\n21:45 \u2013 22:45 | Singing and Multimodal Synthesis\r\n<strong>Adversarially Trained Multi-Singer Sequence-To-Sequence Singing Synthesizer<\/strong>\r\nJie Wu, Jian Luan\r\n\r\n21:45 \u2013 22:45 | Singing and Multimodal Synthesis\r\n<strong>XiaoiceSing: A High-Quality and Integrated Singing Voice Synthesis System<\/strong>\r\nPeiling Lu, Jie Wu, Jian Luan, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/query-understanding-through-knowledge-based-conceptualization\/\">Xu Tan<\/a>, Li Zhou\r\n\r\n21:45 \u2013 22:45 | Student Events\r\n<strong>ISCA-SAC: 2nd Mentoring Event<\/strong>\r\nMentor: <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a>\r\n<h2>Tuesday, October 27<\/h2>\r\n19:15 \u2013 20:15 | Feature extraction and distant ASR\r\n<strong>Bandpass Noise Generation and Augmentation for Unified ASR<\/strong>\r\nKshitiz Kumar, Bo Ren, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ygong\/\">Yifan Gong<\/a>, Jian Wu\r\n\r\n19:15 \u2013 20:15 | Search for speech recognition\r\n<strong>Combination of end-to-end and hybrid models for speech recognition<\/strong>\r\nJeremy Heng Meng Wong, Yashesh Gaur, Rui Zhao, Liang Lu, Eric Sun, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ygong\/\">Yifan Gong<\/a>\r\n<h2>Wednesday, October 28<\/h2>\r\n19:15 \u2013 20:15 | Streaming ASR\r\n<strong>1-D Row-Convolution LSTM: Fast Streaming ASR at Accuracy Parity with LC-BLSTM<\/strong>\r\nKshitiz Kumar, Chaojun Liu, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ygong\/\">Yifan Gong<\/a>, Jian Wu\r\n\r\n19:15 \u2013 20:15 | Streaming ASR\r\n<strong>Low Latency End-to-End Streaming Speech Recognition with a Scout Network<\/strong>\r\nChengyi Wang, Yu Wu, Liang Lu, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/shujliu\/\">Shujie Liu<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a>, Guoli Ye, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/mingzhou\/\">Ming Zhou<\/a>\r\n\r\n19:15 \u2013 20:15 | Streaming ASR\r\n<strong>Transfer Learning Approaches for Streaming End-to-End Speech Recognition System<\/strong>\r\nVikas Joshi, Rui Zhao, Rupesh Mehta, Kshitiz Kumar, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a>\r\n\r\n19:15 \u2013 20:15 | Applications of ASR\r\n<strong>SpecMark: A Spectral Watermarking Framework for IP Protection of Speech Recognition Systems<\/strong>\r\nHuili Chen, Bita Darvish Rouhani, Farinaz Koushanfar\r\n\r\n19:15 \u2013 20:15 | Single-channel speech enhancement I\r\n<strong>Low-Latency Single Channel Speech Dereverberation using U-Net Convolutional Neural Networks<\/strong>\r\nAhmet E. Bulut, Kazuhito Koishida\r\n\r\n19:15 \u2013 20:15 | Single-channel speech enhancement I\r\n<strong>Single-channel speech enhancement by subspace affinity minimization<\/strong>\r\nDung Tran, Kazuhito Koishida\r\n\r\n19:15 \u2013 20:15 | Deep Noise Suppression Challenge\r\n<strong>The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results<\/strong>\r\nChandan Karadagur, Ananda Reddy, Vishak Gopal, Ross Cutler, Ebrahim Beyrami, Roger Cheng, Harishchandra Dubey, Sergiy Matusevych, Robert Aichner, Ashkan Aazami, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/sebraun\/\">Sebastian Braun<\/a>, Puneet Rana, Sriram Srinivasan, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/chiw\/\">Johannes Gehrke<\/a>\r\n\r\n20:30 \u2013 21:30 | Spoken Term Detection\r\n<strong>Re-weighted Interval Loss for Handling Data Imbalance Problem of End-to-End Keyword Spotting<\/strong>\r\nKun Zhang, Zhiyong Wu, Daode Yuan, Jian Luan, Jia Jia, Helen Meng, Binheng Song\r\n\r\n20:30 \u2013 21:30 | Training strategies for ASR\r\n<strong>Serialized Output Training for End-to-End Overlapped Speech Recognition<\/strong>\r\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/nakanda\/\">Naoyuki Kanda<\/a>, Yashesh Gaur, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/xiaofewa\/\">Xiaofei Wang<\/a>, Zhong Meng, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/tayoshio\/\">Takuya Yoshioka<\/a>\r\n\r\n20:30 \u2013 21:30 | Speech transmission &amp; coding\r\n<strong>An Open source Implementation of ITU-T Recommendation P.808 with Validation<\/strong>\r\nBabak Naderi, Ross Cutler\r\n\r\n20:30 \u2013 21:30 | Speech transmission &amp; coding\r\n<strong>DNN No-Reference PSTN Speech Quality Prediction<\/strong>\r\nGabriel Mittag, Ross Cutler, Yasaman Hosseinkashi, Michael Revow, Sriram Srinivasan, Naglakshmi Chande, Robert Aichner\r\n\r\n20:30 \u2013 21:30 | Speech Synthesis: Multilingual and Cross-lingual approaches\r\n<strong>On Improving Code Mixed Speech Synthesis with Mixlingual Grapheme-to-Phoneme Model<\/strong>\r\nShubham Bansal, Arijit Mukherjee, Sandeepkumar Satpal, Rupesh Mehta\r\n\r\n21:45 \u2013 22:45 | Speech Synthesis Paradigms and Methods II\r\n<strong>Towards Universal Text-to-Speech<\/strong>\r\nJingzhou Yang, Lei He\r\n\r\n21:45 \u2013 22:45 | Speech Synthesis Paradigms and Methods II\r\n<strong>Enhancing Monotonicity for Robust Autoregressive Transformer TTS<\/strong>\r\nXiangyu Liang, Zhiyong Wu, Runnan Li, Yanqing Liu, Sheng Zhao\r\n\r\n21:45 \u2013 22:45 | Speech Synthesis: Prosody and Emotion\r\n<strong>Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis<\/strong>\r\nYukiya Hono, Kazuna Tsuboi, Kei Sawada, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda\r\n\r\n21:45 \u2013 22:45 | Speech Synthesis: Prosody and Emotion\r\n<strong>GAN-based Data Generation for Speech Emotion Recognition<\/strong>\r\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/seeskime\/\">Sefik Emre Eskimez<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/didimit\/\">Dimitrios Dimitriadis<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/rogmyr\/\">Robert Gmyr<\/a>, Kenichi Kumatani\r\n\r\n21:45 \u2013 22:45 | Student Events\r\n<strong>ISCA-SAC: 7th Students Meet the Experts<\/strong>\r\nPanelist: <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/event\/interspeech-2018-special-session-low-resource-speech-recognition-challenge-indian-languages\/\">Sunayana Sitaram<\/a>\r\n<h2>Thursday, October 29<\/h2>\r\n19:15 \u2013 20:15 | Speech Synthesis: Neural Waveform Generation II\r\n<strong>An Efficient Subband Linear Prediction for LPCNet-based Neural Synthesis<\/strong>\r\nYang Cui, Xi Wang, Lei He, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/frankkps\/\">Frank Soong<\/a>\r\n\r\n19:15 \u2013 20:15 | ASR neural network architectures and training II\r\n<strong>Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability<\/strong>\r\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a>, Rui Zhao, Zhong Meng, Yanqing Liu, Wenning Wei, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/sarangp\/\">Sarangarajan Parthasarathy<\/a>, Vadim Mazalov, Zhenghao Wang, Lei He, Sheng Zhao, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ygong\/\">Yifan Gong<\/a>\r\n\r\n19:15 \u2013 20:15 | New Trends in self-supervised speech processing\r\n<strong>Sequence-level Self-learning with Multiple Hypotheses<\/strong>\r\nKenichi Kumatani, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/didimit\/\">Dimitrios Dimitriadis<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/rogmyr\/\">Robert Gmyr<\/a>, Yashesh Gaur, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/seeskime\/\">Sefik Emre Eskimez<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/nzeng\/\">Michael Zeng<\/a>\r\n\r\n19:15 \u2013 20:15 | Spoken Dialogue System\r\n<strong>Discriminative Transfer Learning for Optimizing ASR and Semantic Labeling in Task-oriented Spoken Dialog<\/strong>\r\n<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/yaoqian\/\">Yao Qian<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/yushi\/\">Yu Shi<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/nzeng\/\">Michael Zeng<\/a>\r\n\r\n19:15 \u2013 20:15 | Spoken Dialogue System\r\n<strong>Datasets and Benchmarks for Task-Oriented Log Dialogue Ranking Task<\/strong>\r\nXinnuo Xu, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/yizzhang\/\">Yizhe Zhang<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/laliden\/\">Lars Liden<\/a>, Sungjin Lee\r\n\r\n19:15 \u2013 20:15 | Speech Synthesis: Toward End-to-End Synthesis\r\n<strong>MoBoAligner: a Neural Alignment Model for Non-autoregressive TTS with Monotonic Boundary Search<\/strong>\r\nNaihan Li, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/shujliu\/\">Shujie Liu<\/a>, Yanqing Liu, Sheng Zhao, Ming Liu\r\n\r\n19:15 \u2013 20:15 | Speech Synthesis: Toward End-to-End Synthesis\r\n<strong>MultiSpeech: Multi-Speaker Text to Speech with Transformer<\/strong>\r\nMingjian Chen, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/xuta\/\">Xu Tan<\/a>, Yi Ren, Jin Xu, Hao Sun, Sheng Zhao, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/taoqin\/\">Tao Qin<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/tyliu\/\">Tie-Yan Liu<\/a>\r\n\r\n20:30 \u2013 21:30 | Speech Synthesis: Prosody Modeling\r\n<strong>Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency<\/strong>\r\nMatt Whitehill, Shuang Ma, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/damcduff\/\">Daniel McDuff<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/yalesong\/\">Yale Song<\/a>\r\n\r\n21:45 \u2013 22:45 | Multilingual and code-switched ASR\r\n<strong>Improving Low Resource Code-switched ASR using Augmented Code-switched TTS<\/strong>\r\nYash Sharma, Basil Abraham, Karan Taneja, Preethi Jyothi\r\n\r\n21:45 \u2013 22:45 | ASR neural network architectures II \u2013 Transformers\r\n<strong>Exploring Transformers for Large-Scale Speech Recognition<\/strong>\r\nLiang Lu, Changliang Liu, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinyli\/\">Jinyu Li<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ygong\/\">Yifan Gong<\/a>"}],"msr_startdate":"2020-10-25","msr_enddate":"2020-10-29","msr_event_time":"","msr_location":"Virtual","msr_event_link":"","msr_event_recording_link":"","msr_startdate_formatted":"October 25, 2020","msr_register_text":"Watch now","msr_cta_link":"","msr_cta_text":"","msr_cta_bi_name":"","featured_image_thumbnail":null,"event_excerpt":"Microsoft is proud to be a gold sponsor of INTERSPEECH 2020.","msr_research_lab":[],"related-researchers":[],"msr_impact_theme":[],"related-academic-programs":[],"related-groups":[],"related-projects":[],"related-opportunities":[],"related-publications":[810823,810844],"related-videos":[],"related-posts":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event\/699085","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-event"}],"version-history":[{"count":5,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event\/699085\/revisions"}],"predecessor-version":[{"id":1146925,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event\/699085\/revisions\/1146925"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=699085"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=699085"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=699085"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=699085"},{"taxonomy":"msr-video-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video-type?post=699085"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=699085"},{"taxonomy":"msr-program-audience","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-program-audience?post=699085"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=699085"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=699085"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}