Abstract

In this paper, we propose a robust compensation strategy to deal effectively with extraneous acoustic variations for spontaneous speech recognition. This strategy extends speaker adaptive training, and uses hidden Markov models (HMM) parameter transformations to normalize the extraneous variations in the training data according to a set of predefined conditions. A “compact” model and the associated prior probability density functions (PDFs) of transformation parameters are estimated using the maximum likelihood criterion. In the testing phase, the generic model and the prior PDFs are used to search for the unknown word sequence based on Bayesian prediction classification (BPC). The proposed strategy is evaluated in the switchboard task, and is used to deal with three types of extraneous variations and mismatch in conversational speech recognition: pronunciation variations, inter-speaker variability, and telephone handset mismatch. Experimental results show that moderate word error rate reduction is achieved in comparison with a well-trained baseline HMM system under identical experimental conditions.