Some Recent Advances in Gaussian Mixture Modeling for Speech Recognition

  • Ramesh Gopinath | IBM Research

State-of-the-art Hidden Markov Model (HMM) based speech recognition systems typically use Gaussian Mixture Models (GMMs) to model the acoustic features associated with each HMM state. Due to computational, storage and robust estimation considerations the covariance matrices of the Gaussians in these GMMs are typically diagonal. In this talk I will describe several new techniques to model the acoustic features associated with an HMM state better – subspace constrained GMMs (SCGMMs), non-linear volume-preserving acoustic feature space transformations etc. Even with better models, one has to deal with mismatches between the training and test conditions. This problem can be addressed by adapting either the acoustic features or the acoustic models to reduce the mismatch. In this talk I will present several approaches to adaptation – FMAPLR (a variant of FMLLR that works well with very little adaptation data), adaptation of the front-end parameters, adaptation of SCGMMs, etc. While the ideas presented are explored and evaluated in the context of speech recognition, the talk should appeal to anyone with an interest in statistical modeling.

Speaker Details

Dr. Gopinath has a PhD from Rice University and has been with the Speech Group at IBM T. J. Watson Research Center since March 1994. His primary research interests are in statistical learning, speech recognition and signal processing. He currently manages the research effort in acoustic and language modeling that supports the telephony and embedded speech recognition product offerings from IBM. Prior to this assignment he led the IBM Broadcast News Transcription team that won the NIST/DARPA Broadcast News Transcription competition in 1998 and 1999.