I am a Principal Research Manager in the Azure Cognitive Services Research Group of Microsoft, leading efforts related to conversation transcription and real-time speech enhancement at the Speech Research Team. Our recent research topics include speech separation and enhancement, end-to-end multi-talker speech recognition and speaker diarization, self-supervised learning for speech, and multi-modal learning. With my team members and colleagues at Microsoft, I helped the development of some capabilities of Microsoft Azure Speech Services, including Speech Devices SDK and Conversation Transcription, which is powering transcription features of several Microsoft products.
Prior to joining Microsoft in 2016, I worked at NTT Communication Science Laboratories, where I co-invented the weighted prediction error (WPE) method, a dereverberation algorithm that is widely used both in the research community and the industry, with my colleagues. I also conducted research at the University of Cambridge as a Visiting Scholar in 2013 and worked for Doshisha University as a Part-Time Lecturer in 2015.
Our ICASSP 2022 paper received the IEEE Signal Processing Society Conference Best Paper Award for Industry. I have been a member of the Speech and Language Processing Technical Committee (SLTC) of the IEEE Signal Processing Society since 2018. I served as an organizing committee member for the REVERB Challenge and the accompanying workshop in 2013.
My publication list on this website may not be up to date. Please see my Google Scholar profile (linked below) instead.
The ability to perceive communication signals and make sense of them played an essential role in the evolution of human intelligence. Computing technology is following the same trajectory. Now, computer vision and automatic speech recognition (ASR) technologies have enabled the…
Recent advances in machine learning and signal processing, as well as the availability of massive computing power, have resulted in dramatic and steady improvement in speech recognition accuracy. Voice interfaces to digital devices have become more and more common. Lectures…