Environmental robustness to speech recognition via speech spectral and cepstral feature degradation models

November 12, 2010
Kshitiz Kumar | Carnegie Mellon

The talk will present some of the algorithms developed as part of my graduate work at Carnegie Mellon. Speech is the natural medium of communication for humans, and in the last decade various speech technologies like automatic speech recognition (ASR), voice response systems etc. have considerably matured. The above systems rely on the clarity of the captured speech but many of the real-world environments include noise and reverberation that mitigate the system performance. The key focus of the talk will on the ASR robustness to reverberation. We first provide a new framework to adequately and efficiently represent the problem of reverberation in the speech spectral and cepstral feature domains and later develop different dereverberation algorithms on the proposed framework. The algorithms reduce the uncertainly involved in dereverberation tasks by using speech knowledge in terms of cepstral auto-correlation, cepstral distribution, and, non-negativity and sparsity of spectral values. We demonstrate the success of our algorithms on clean-training as well as matched-training.
Apart from dereverberation, we also provide two approaches for noise robustness. One of them is in terms of audio-visual feature combination with new visual features being derived from the profile-view images of a person. The second noise robustness approach is via temporal-difference operation in speech spectral domain, where via a theoretical analysis, we also predict an expected improvement in the SNR threshold shift for white-noise conditions. Finally, we combine our individual dereverberation and noise compensation approaches for a joint noise and reverberation compensation task.
Website: http://www.ece.cmu.edu/~kshitizk

Speaker Details

Kshitiz Kumar received his B.Tech. degree in Electrical Engineering at Indian Institute of Technology (IIT) Kharagpur in 2004. He worked as Software Engineer in the Multimedia Division of Samsung R&D in Bangalore, India for a year. He joined direct Ph.D. program in Electrical and Computer Engineering at Carnegie Mellon in August 2005, where he expects to complete his Ph.D. degree by January 2011. His research interests lie in the general area of signal content analysis, feature extraction, speech recognition, and machine learning. For his Ph.D. work, he has worked on and developed algorithms for the robustness of speech recognition to noise and reverberation. He was awarded the IEEE 2008 Spoken Language Processing Award for his ICASSP 2008 work on speech dereverberation.