Microsoft @ ICASSP 2020

Microsoft @ ICASSP 2020



Tuesday, May 5

11:30 – 13:30 CEST

MLSP-P2: Applications in Speech and Audio
Multi-Label Sound Event Retrieval Using A Deep Learning-Based Siamese Structure With A Pairwise Presence Matrix
Jianyu Fan, Eric NicholsDaniel Tompkins, Ana Elisa Méndez Méndez, Benjamin Elizalde, Philippe Pasquier

11:50 – 12:10 CEST

SPE-L1: End-to-end Speech Recognition I: Streaming
Minimum Latency Training Strategies for Streaming Sequence-to-Sequence ASR
Hirofumi Inaguma, Yashesh GaurLiang LuJinyu LiYifan Gong

16:30 – 18:30 CEST

SPE-P3: Machine Learning for Speech Synthesis I
Improving Prosody with Linguistic and Bert Derived Features in Multi-Speaker Based Mandarin Chinese Neural TTS
Yujia XiaoLei HeHuaiping MingFrank K. Soong

17:30 – 17:50 CEST

AUD-L2: Deep Learning for Source Separation
Dual-Path RNN: Efficient Long Sequence Modeling for Time-Domain Single-Channel Speech Separation
Yi Luo, Zhuo ChenTakuya Yoshioka

Wednesday, May 6

9:00 – 11:00 CEST

AUD-P4: Feedback, Noise, and Reverberation
Joint Beamforming and Reverberation Cancellation Using a Constrained Kalman Filter with Multichannel Linear Prediction
Sahar Hashemgeloogerdi, Sebastian Braun

AUD-P4: Feedback, Noise, and Reverberation
Predicting Word Error Rate for Reverberant Speech
Hannes GamperDimitra EmmanouilidouSebastian BraunIvan Tashev

SPE-P5: Deep Speaker Recognition Models
Improving Deep CNN Networks with Long Temporal Context for Text-independent Speaker Verification
Yong ZhaoTianyan ZhouZhuo ChenJian Wu

9:20 – 9:40 CEST

SPE-L6: Speech Enhancement II: Single Channel
Low-Latency Single Channel Speech Enhancement Using U-Net Convolutional Neural Networks
Ahmet E. BulutKazuhito Koishida

11:30 – 13:30 CEST

SAM-P3: Sparsity, Super-Resolution and Imaging
Low-Rank Toeplits Matrix Estimation Via Random Ultra-Sparse Rulers
Hannah Lawrence, Jerry Li, Cameron Musco, Christopher Musco

SPE-P8: Robust Speech Recognition
A Practical Two-Stage Training Strategy for Multi-Stream End-to-End Speech Recognition
Ruizhi Li, Gregory Sell, Xiaofei Wang, Shinji Watanabe, Hynek Hermansky

16:30 – 16:50 CEST

IFS-L2: Privacy, Biometrics and Information Security
Privacy-Preserving Phishing Web Page Classification Via Fully Homomorphic Encryption
Edward Chou, Arun GururajanKim LaineNitin Kumar GoelAnna BertigerJack W. Stokes

16:30 – 18:30 CEST

HLT-P1: Spoken Language Understanding and Dialogue I
Fast Domain Adaptation for Goal-Oriented Dialogue Using A Hybrid Generative-Retrieval Transformer
Igor Shalyminov, Alessandro SordoniAdam AtkinsonHannes Schulz

SPE-P9: End-to-end Speech Recognition III: General Topics
Exploring Pre-training with Alignments for RNN Transducer based End-to-End Speech Recognition
Hu Hu, Rui ZhaoJinyu LiLiang LuYifan Gong

Thursday, May 7

9:00 – 11:00 CEST

HLT-P2: Speech and Language Analysis
Combining Acoustics, Content and interaction Features to Find Hot Spots in Meetings
Dave Makhervaks, William HinthornDimitrios Dimitriadis, Andreas Stolcke

10:20 – 10:40 CEST

AUD-L6: Acoustic Environments and Spatial Audio II
Fast Acoustic Scattering Using Convolutional Neural Networks
Ziqi Fan, Vibhav VineetHannes GamperNikunj Raghuvanshi

10:40 – 11:00 CEST

SPE-L11: Speech Separation and Extraction I: Single Channel
An Online Speaker-Aware Speech Separation Approach Based on Time-Domain Representation
Hui Wang, Yan Song, Zeng-Xi Li, Ian McLoughlin, Li-Rong Dai

11:30 – 13:30 CEST

SPE-P12: Machine Learning for Speech Synthesis II
Improving LPCNET-Based Text-to-Speech with Linear Prediction-Structured Mixture Density Network
Min-Jae Hwang, Eunwoo Song, Ryuichi Yamamoto, Frank Soong, Hong-Goo Kang

SPE-P13: Speech Separation and Extraction III
Continuous Speech Separation: Dataset and Analysis
Zhuo ChenTakuya YoshiokaLiang LuTianyan ZhouZhong MengYi LuoJian WuXiong XiaoJinyu Li

12:10 – 12:30 CEST

SPE-L12: Speech Separation and Extraction II: Multi-channel
End-to-End Microphone Permutation and Number Invariant Multi-Channel Speech Separation
Yi Luo, Zhuo Chen, Nima Mesgarani, Takuya Yoshioka

16:30 – 18:30 CEST

MMSP-P3:  Multimedia Signal Processing
Supervised Deep Hashing for Efficient Audio Event Retrieval
Arindam Jati, Dimitra Emmanouilidou

MMSP-P3:  Multimedia Signal Processing
Multimodal Active Speaker Detection and Virtual Cinematography for Video Conferencing
Ross Cutler, Ramin Mehran, Sam Johnson, Cha Zhang, Adam Kirk, Oliver Whyte, Adarsh Kowdle

SPE-P15: Speech Recognition: Adaptation
L-Vector: Neural Label Embedding for Domain Adaptation
Zhong Meng, Hu Hu, Jinyu LiChangliang LiuYan HuangYifan Gong, Chin-Hui Lee

SPE-P15: Speech Recognition: Adaptation
Acoustic Model Adaptation for Presentation Transcription and Intelligent Meeting Assistant Systems
Yan HuangYifan Gong

SPE-P15: Speech Recognition: Adaptation
Using Personalized Speech Synthesis and Neural Language Generator for Rapid Speaker Adaptation
Yan HuangLei HeWenning WeiWilliam GaleJinyu LiYifan Gong

SS-P1: Signal Processing Education: Trends and Innovations
A Dataset for Measuring Reading Levels in India at Scale
Dolly Agarwal, Jayant Gupchup, Nishant Baghel

17:30 – 17:30 CEST

IDSP-L2: Industry Session on Large-Scale Distributed Learning Strategies
Parallelizing Adam Optimizer with Blockwise Model-Update Filtering
Kai Chen, Haisong Ding, Qiang Huo

Friday, May 8

8:00 – 10:00 CEST

IFS-P1: Information Hiding, Biometrics and Security
Texception: A Character/Word-Level Deep Learning Model for Phishing URL Detection
Farid TajaddodianfarJack W. StokesArun Gururajan

SAM-P6: Detection, Estimation and Classification
Static Visual Spatial Priors For DOA Estimation
Pawel Swietojanski, Ondrej Miksik

SPE-P16: Word Spotting
Adaptation of RNN Transducer with Text-to-Speech Technology for Keyword Spotting
Eva SharmaGuoli YeWenning WeiRui ZhaoYao TianJian WuLei HeEd LinYifan Gong

SPE-P17: Speech Enhancement IV
AV(SE) ²: Audio-Visual Squeeze-Excite Speech Enhancement
Michael Iuzzolino, Kazuhito Koishida

8:20 – 8:40 CEST

HLT-L2: Language Modeling
Low-bit Quantization of Recurrent Neural Network Language Models Using Alternating Direction Methods of Multipliers
Junhao Xu, Xie Chen, Shoukang Hu, Jianwei Yu, Xunying Liu, Helen Mei-Ling Meng

9:40 – 10:00 CEST

MLSP-L10: Deep Neural Network Structures
Neural Attentive Multiview Machines
Oren BarkanOri KatzNoam Koenigstein

11:45 – 13:45 CEST

AUD-P11: Signal Enhancement and Restoration II
Geometrically Constrained Independent Vector Analysis for Directional Speech Enhancement
Li Li, Kazuhito Koishida

AUD-P11: Signal Enhancement and Restoration II
Weighted Speech Distortion Losses for Neural-Network-Based Real-Time Speech Enhancement
Yangyang Xia, Sebastian BraunChandan ReddyHarishchandra DubeyRoss CutlerIvan Tashev

HLT-P5: Multilingual Processing of Language
Addressing Accent Mismatch in Mandarin-English Code-Switching Speech Recognition
Zhili TanXinghua FanHui ZhuEd Lin

IFS-P2: Anonymization, Security and Privacy
Detection of Malicious VSCRIPT Using Static and Dynamic Analysis with Recurrent Deep Learning
Jack W. Stokes, Rakshit Agrawal, Geoff McDonald

SPE-P19: Machine Learning for Speech Synthesis III
ESPNET-TTS: Unified, Reproducible, and Integartable Open Source End-to-End Text-to-Speech Toolkit
Tomoki Hayashi, Ryuichi Yamamoto, Katsuki Inoue, Takenori Yoshimura, Shinji Watanabe, Tomoki Toda, Kazuya Takeda, Yu Zhang, Xu Tan

SPE-P20: Speech Recognition: Acoustic Modelling II
High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model
Jinyu LiRui ZhaoEric SunJeremy WongAmit DasZhong MengYifan Gong

12:25 – 12:45 CEST

SPE-L16: Speaker Diarization
Speaker Diarization with Session-Level Speaker Embedding Refinement Using Graph Neural Networks
Jixuan Wang, Xiong XiaoJian WuRanjani Ramamurthy, Frank Rudzicz, Michael Brudno

13:05 – 13:25 CEST

SPE-L16: Speaker Diarization
A Memory Augmented Architecture for Continuous Speaker Identification in Meetings
Nikolaos Flemotomos, Dimitrios Dimitriadis

15:15 – 17:15 CEST

SPE-P21: Voice Conversion
An Improved Frame-Unit-Selection Based Voice Conversion System Without Parallel Training Data
Feng-Long Xie, Xin-Hui Li, Bo Liu, Yi-Bin Zheng, Li Meng, Li Lu, Frank K. Soong

16:15 – 16:30 CEST

MLSP-L11: Attention Needs
Attentive Item2vec: Neural Attentive User Representations
Oren Barkan, Avi Caciularu, Ori KatzNoam Koenigstein