A Tutorial-style Introduction to Subspace Gaussian Mixture Models for Speech Recognition

  • Daniel Povey

MSR-TR-2009-111 |

This is an in-depth, tutorial-style introduction to the techniques involved in training a factor analyzed style of speech recognition system. Algorithms are explained in detail, with an emphasis on the how-to rather than the derivations. The recipe described here is both an extension to and a special case of the prior work we have done. Changes include a simplification of the procedure used to initialize these models, the introduction of “sub-models” which saves memory and may have modeling advantages, an extended approach to factor based speaker adaptation that uses the sub-models, and a mechanism to estimate a subspace-constrained version of Constrained MLLR transforms in this framework.