Microsoft at ICASSP 2021

About

Microsoft is proud to be a Silver sponsor of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) event. See more details on our contributions below.

 

Session Chairs

The following Microsoft researchers will chair sessions at the conference.

Zhuo Chen
Hannes Gamper
Yifan Gong
Jinyu Li
Zhong Meng
Chandan K A Reddy
Ivan Tashev
Takuya Yoshioka

Sessions

All times are displayed in Eastern Daylight Time (UTC -4)

Monday, June 7

10:00 – 13:30 | Tutorial

Distant conversational speech recognition and analysis: Recent advances, and trends towards end-to-end optimization

Presenters: Keisuke Kinoshita, Yusuke Fujita, Naoyuki Kanda, Shinji Watanabe

18:00 – 19:00

Young Professionals Panel Discussion

Moderator: Subhro Das
Panelists: Sabrina Rashid, Vanessa Testoni, Hamid Palangi


Tuesday, June 8

13:00 – 13:45 | Speech Synthesis 1: Architecture

Lightspeech: Lightweight and Fast Text to Speech with Neural Architecture Search

Renqian Luo, Xu TanRui WangTao QinJinzhu LiSheng Zhao, Enhong Chen, Tie-Yan Liu

13:00 – 13:45 | Speech Synthesis 1: Architecture

A New High Quality Trajectory Tiling Based Hybrid TTS In Real Time

Feng-Long Xie, Xin-Hui Li, Wen-Chao Su, Li Lu, Frank K. Soong

13:00 – 13:45 | Language Modeling 1: Fusion and Training for End-to-End ASR

Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition

Zhong MengNaoyuki KandaYashesh GaurSarangarajan Parthasarathy, Eric Sun, Liang LuXie ChenJinyu LiYifan Gong

13:00 – 13:45 | Audio and Speech Source Separation 1: Speech Separation

Session Chair: Zhuo Chen

Rethinking The Separation Layers In Speech Separation Networks

Yi Luo, Zhuo Chen, Cong Han, Chenda Li, Tianyan Zhou, Nima Mesgarani

13:00 – 13:45 | Deep Learning Training Methods 3

Session Chair: Jinyu Li

13:00 – 13:45 | Brain-Computer Interfaces

Decoding Music Attention from “EEG Headphones”: A User-Friendly Auditory Brain-Computer Interface

Wenkang An, Barbara Shinn-Cunningham, Hannes GamperDimitra EmmanouilidouDavid JohnstonMihai JalobeanuEdward CutrellAndrew Wilson, Kuan-Jung Chiang, Ivan Tashev

14:00 – 14:45 | Speech Enhancement 1: Speech Separation

Session Chair: Takuya Yoshioka

Dual-Path Modeling for Long Recording Speech Separation in Meetings

Chenda Li, Zhuo Chen, Yi Luo, Cong Han, Tianyan Zhou, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe, Yanmin Qian

14:00 – 14:45 | Speech Enhancement 1: Speech Separation

Continuous Speech Separation with Conformer

Sanyuan Chen, Yu WuZhuo ChenJian WuJinyu LiTakuya YoshiokaChengyi WangShujie LiuMing Zhou

14:00 – 14:45 | Speech Enhancement 2: Speech Separation and Dereverberation

Session Chair: Takuya Yoshioka

14:00 – 14:45 | Speaker Recognition 1: Benchmark Evaluation

Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020

Xiong XiaoNaoyuki KandaZhuo ChenTianyan ZhouTakuya YoshiokaSanyuan ChenYong ZhaoGang LiuYu WuJian WuShujie LiuJinyu LiYifan Gong

14:00 – 14:45 | Dialogue Systems 2: Response Generation

Topic-Aware Dialogue Generation with Two-Hop Based Graph Attention

Shijie Zhou, Wenge Rong, Jianfei Zhang, Yanmeng Wang, Libin Shi, Zhang Xiong

16:30 – 17:15 | Speech Recognition 4: Transformer Models 2

Developing Real-Time Streaming Transformer Transducer for Speech Recognition on Large-Scale Dataset

Xie ChenYu WuZhenghao WangShujie LiuJinyu Li

16:30 – 17:15 | Active Noise Control, Echo Reduction, and Feedback Reduction 2: Active Noise Control and Echo Cancellation

Session Chair: Hannes Gamper

ICASSP 2021 Acoustic Echo Cancellation Challenge: Datasets, Testing Framework, and Results

Kusha Sridhar, Ross CutlerAndo Saabas, Tanel Parnamaa, Markus LoideHannes GamperSebastian BraunRobert AichnerSriram Srinivasan

16:30 – 17:15 | Learning

Session Chair: Zhong Meng

Sequence-Level Self-Teaching Regularization

Eric Sun, Liang LuZhong MengYifan Gong


Wednesday, June 9

13:00 – 13:45 | Language Understanding 1: End-to-end Speech Understanding 1

Speech-Language Pre-Training for End-to-End Spoken Language Understanding

Yao Qian, Ximo Bian, Yu ShiNaoyuki Kanda, Leo Shen, Zhen XiaoMichael Zeng

13:00 – 13:45 | Audio and Speech Source Separation 4: Multi-Channel Source Separation

DBnet: Doa-Driven Beamforming Network for end-to-end Reverberant Sound Source Separation

Ali Aroudi, Sebastian Braun

14:00 – 14:45 | Speech Enhancement 4: Multi-channel Processing

Don’t Shoot Butterfly with Rifles: Multi-Channel Continuous Speech Separation with Early Exit Transformer

Sanyuan Chen, Yu WuZhuo ChenTakuya YoshiokaShujie LiuJinyu Li, Xiangzhan Yu

14:00 – 14:45 | Matrix Factorization and Applications

Cold Start Revisited: A Deep Hybrid Recommender with Cold-Warm Item Harmonization

Oren Barkan, Roy Hirsch, Ori Katz, Avi Caciularu, Yoni Weill, Noam Koenigstein

14:00 – 14:45 | Biological Image Analysis

CMIM: Cross-Modal Information Maximization For Medical Imaging

Tristan Sylvain, Francis Dutil, Tess Berthier, Lisa Di Jorio, Margaux Luck, Devon Hjelm, Yoshua Bengio

15:30 – 16:15 | Speech Recognition 8: Multilingual Speech Recognition

Multi-Dialect Speech Recognition in English Using Attention on Ensemble of Experts

Amit DasKshitiz KumarJian Wu

15:30 – 16:15 | Quality and Intelligibility Measures

MBNET: MOS Prediction for Synthesized Speech with Mean-Bias Network

Yichong Leng, Xu TanSheng ZhaoFrank K. Soong, Xiang-Yang Li, Tao Qin

15:30 – 16:15 | Quality and Intelligibility Measures

Crowdsourcing Approach for Subjective Evaluation of Echo Impairment

Ross Cutler, Babak Nadari, Markus LoideSten SootlaAndo Saabas

16:30 – 17:15 | Speech Recognition 9: Confidence Measures

Session Chair: Yifan Gong

16:30 – 17:15 | Speech Recognition 10: Robustness to Human Speech Variability

Session Chair: Yifan Gong

16:30 – 17:15 | Speech Processing 2: General Topics

Dnsmos: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors

Chandan K A ReddyVishak GopalRoss Cutler

16:30 – 17:15 | Style and Text Normalization

Generating Human Readable Transcript for Automatic Speech Recognition with Pre-Trained Language Model

Junwei Liao, Yu ShiMing GongLinjun ShouSefik EskimezLiyang Lu, Hong Qu, Michael Zeng

16:30 – 17:15 | Modeling, Analysis and Synthesis of Acoustic Environments 3: Acoustic Analysis

Prediction of Object Geometry from Acoustic Scattering Using Convolutional Neural Networks

Ziqi Fan, Vibhav Vineet, Chenshen Lu, T.W. Wu, Kyla McMullen


Thursday, June 10

13:00 – 13:45 | Speech Recognition 11: Novel Approaches

Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR

Naoyuki KandaZhong MengLiang LuYashesh GaurXiaofei WangZhuo ChenTakuya Yoshioka

13:00 – 13:45 | Speech Synthesis 5: Prosody & Style

Speech Bert Embedding for Improving Prosody in Neural TTS

Liping ChenYan DengXi WangFrank K. SoongLei He

13:00 – 13:45 | Speech Synthesis 6: Data Augmentation & Adaptation

Adaspeech 2: Adaptive Text to Speech with Untranscribed Data

Yuzi Yan, Xu Tan, Bohan Li, Tao QinSheng Zhao, Yuan Shen, Tie-Yan Liu

14:00 – 14:45 | Speech Enhancement 5: DNS Challenge Task

Session Chair: Chandan K A Reddy

ICASSP 2021 Deep Noise Suppression Challenge

Chandan K A ReddyHarishchandra DubeyVishak GopalRoss CutlerSebastian BraunHannes GamperRobert AichnerSriram Srinivasan

14:00 – 14:45 | Speech Enhancement 6: Multi-modal Processing

Session Chair: Chandan K A Reddy

14:00 – 14:45 | Graph Signal Processing

Fast Hierarchy Preserving Graph Embedding via Subspace Constraints

Xu Chen, Lun Du, Mengyuan Chen, Yun Wang, QingQing Long, Kunqing Xie

15:30 – 16:15 | Speech Recognition 13: Acoustic Modeling 1

Hypothesis Stitcher for End-to-End Speaker-Attributed ASR on Long-Form Multi-Talker Recordings

Xuankai Chang, Naoyuki KandaYashesh GaurXiaofei WangZhong MengTakuya Yoshioka

15:30 – 16:15 | Speech Recognition 14: Acoustic Modeling 2

Ensemble Combination between Different Time Segmentations

Jeremy Heng Meng WongDimitrios DimitriadisKenichi KumataniYashesh GaurGeorge PolovetsPartha Parthasarathy, Eric Sun, Jinyu LiYifan Gong

15:30 – 16:15 | Privacy and Information Security

Detection Of Malicious DNS and Web Servers using Graph-Based Approaches

Jinyuan Jia, Zheng DongJie LiJack W. Stokes

16:30 – 17:15 | Language Assessment

Improving Pronunciation Assessment Via Ordinal Regression with Anchored Reference Samples

Bin Su, Shaoguang MaoFrank K. SoongYan XiaJonathan Tien, Zhiyong Wu

16:30 – 17:15 | Signal Enhancement and Restoration 1: Deep Learning

Towards Efficient Models for Real-Time Deep Noise Suppression

Sebastian BraunHannes GamperChandan K A ReddyIvan Tashev

16:30 – 17:15 | Signal Enhancement and Restoration 3: Signal Enhancement

Phoneme-Based Distribution Regularization for Speech Enhancement

Yajing Liu, Xiulian Peng, Zhiwei Xiong, Yan Lu

16:30 – 17:15 | Audio & Images

Session Chair: Ivan Tashev


Friday, June 11

1:30 – 12:15 | Speech Recognition 18: Low Resource ASR

MixSpeech: Data Augmentation for Low-Resource Automatic Speech Recognition

Linghui Meng, Jin Xu, Xu TanJindong WangTao Qin, Bo Xu

11:30 – 12:15 | Speech Synthesis 7: General Topics

Denoispeech: Denoising Text to Speech with Frame-Level Noise Modeling

Chen Zhang, Yi Ren, Xu Tan, Jinglin Liu, Kejun Zhang, Tao QinSheng ZhaoTie-Yan Liu

13:00 – 13:45 | Speech Enhancement 8: Echo Cancellation and Other Tasks

Cascaded Time + Time-Frequency Unet For Speech Enhancement: Jointly Addressing Clipping, Codec Distortions, And Gaps

Arun Asokan Nair, Kazuhito Koishida

13:00 – 13:45 | Speaker Diarization

Hidden Markov Model Diarisation with Speaker Location Information

Jeremy Heng Meng WongXiong XiaoYifan Gong

13:00 – 13:45 | Detection and Classification of Acoustic Scenes and Events 5: Scenes

Cross-Modal Spectrum Transformation Network for Acoustic Scene Classification

Yang Liu, Alexandros NeophytouSunando SenguptaEric Sommerlade

Grand Challenges

ICASSP 2021 Acoustic Echo Cancellation Challenge

The ICASSP 2021 Acoustic Echo Cancellation Challenge is intended to stimulate research in the area of acoustic echo cancellation (AEC), which is an important part of speech enhancement and still a top issue in audio communication and conferencing systems. We received 17 submissions for the challenge from industry and academic universities. Microsoft is happy to announce the winners of the ICASSP 2021 Acoustic Echo Cancellation Challenge.

 

1st place

Organization: Amazon
Authors: Jean-Marc Valin, Srikanth Tenneti, Karim Helwani, Umut Isik, Arvindh Krishnaswamy
Paper: Low-Complexity, Real-Time Joint Neural Echo Control and Speech Enhancement Based On PercepNet


2nd place

Organization: SoundConnect and Alibaba
Authors: Ziteng Wang, Yueyue Na, Zhang Liu, Biao Tian, Qiang Fu
Paper: Weighted recursive least square filter and neural network based residual echo suppression for the AEC-Challenge


3rd place

Organization: Carl von Ossietzky University Oldenburg
Authors: Nils L. Westhausen, Bernd T. Meyer
Paper: Acoustic echo cancellation with the dual-signal transformation LSTM network

 

ICASSP 2021 Deep Noise Suppression (DNS) Challenge

The ICASSP 2021 Deep Noise Suppression (DNS) Challenge is intended to stimulate research in the area of noise suppression, which is an important part of speech enhancement and still a top issue in audio communication and conferencing systems. We received 19 submissions for the challenge from industry and academic universities. Microsoft is happy to announce the winners of the ICASSP 2021 Deep Noise Suppression Challenge.

 

1st place

Organization: Institute of Acoustics, Chinese Academy of Sciences
Authors: Andong Li, Wenzhe Liu, Xiaoxue Luo, Chengshi Zheng, Xiaodong Li
Paper: ICASSP 2021 DEEP NOISE SUPPRESSION CHALLENGE: DECOUPLING MAGNITUDE AND PHASE OPTIMIZATION WITH A TWO-STAGE DEEP NETWORK


2nd place

Organization: Sogou
Authors: Jingdong Li, Dawei Luo, Yun Liu, Yuanyuan Zhu, Zhaoxia Li, Guohui Cui, Wenqi Tang, Wei Chen
Paper: Densely Connected Multi-Stage Model with Channel Wise Subband Feature for Real-Time Speech Enhancement


3rd place

Organization: Seol National University, Supertone
Authors: Hyeong-Seok Choi, Sungjin Park, Jie Hwan Lee, Hoon Heo, Dongsuk Jeon, Kyogu Lee
Paper: REAL-TIME DENOISING AND DEREVERBERATION WTIH TINY RECURRENT U-NET

Career opportunities