I am a speech technology researcher at Microsoft. My research has been evolving with the goal of creating computers that can recognize human speech in complex acoustic scenes. My current research focuses on multi-talker speech recognition and audio-visual speaker diarization in far-field settings. Technical areas where I have expertise include blind source separation, blind dereverberation, microphone arrays, acoustic modeling, far-field speech recognition, and applications of deep neural networks to audio and speech processing.
Prior to joining Microsoft in 2016, I worked at NTT Communication Science Laboratories in Japan as a Research Scientist for ten years. I also conducted research at the University of Cambridge as a Visiting Scholar in 2013 and worked for Doshisha University as a Part-Time Lecturer in 2015. At NTT, I led an effort to develop its CHiME-3 far-field speech recognition system, which won the challenge with a significant margin over other opponents and popularized the mask-based acoustic beamforming technique. I also contributed to the development of NTT’s REVERB Challenge system, which ranked best in both single and multi-microphone categories. I invented several dereverberation algorithms with my colleagues, which are called a weighted prediction error (WPE) method and used for some commercial products.
I have been a member of the Speech and Language Processing Technical Committee (SLTC) of the IEEE Signal Processing Society since 2018. I served as an organizing committee member for the REVERB Challenge and the accompanying workshop in 2013.