Candidate Talk: Soft Margin Estimation for Automatic Speech Recognition

December 4, 2007
Jinyu Li | Georgia Institute of Technology

In this study, a new discriminative learning framework, called soft margin estimation (SME), is proposed for estimating the parameters of continuous density hidden Markov models (HMMs). The proposed method makes direct use of the successful ideas of margin in support vector machines to improve generalization capability and decision feedback learning in discriminative training to enhance model separation in classifier design. SME directly maximizes the separation of competing models to enhance the testing samples to approach a correct decision if the deviation from training samples is within a safe margin. Frame and utterance selections are integrated into a unified framework to select the training utterances and frames critical for discriminating competing models. SME offers a flexible and rigorous framework to facilitate the incorporation of new margin-based optimization criteria into HMMs training. The choice of various loss functions is illustrated and different kinds of separation measures are defined under a unified SME framework. SME is also shown to be able to jointly optimize feature extraction and HMMs. Both the generalized probabilistic descent algorithm and the Extended Baum-Welch algorithm are applied to solve SME.

SME has demonstrated its great advantage over other discriminative training methods in several speech recognition tasks. Tested on the TIDIGITS digit recognition task, the proposed SME approach achieves a string accuracy of 99.61%, the best result ever reported in literature. On the 5k-word Wall Street Journal task, SME reduced the word error rate (WER) from 5.06% of MLE models to 4.11%, with relative 19% WER reduction. The generalization of SME was also well demonstrated on the Aurora 2 robust speech recognition task, with around 30% relative WER reduction from the clean-trained baseline.

Speaker Details

Jinyu Li received the B. Eng and M. Eng degrees in electrical engineering and information system from University of Science and Technology of China (with the highest honor, Guo Moruo Award), in 1997 and 2000, respectively. After working as a researcher in Intel China research center, he started and led the speech recognition research in Anhui USTC iFlytek from 2001 to 2003, which is now the most successful speech company in China. Since 2004, he is a Ph.D. student at the school of electrical and computer engineering, Georgia Institute of Technology, Atlanta, U.S. He is the winner of Colonel Oscar P. Cleaver Award for the highest score in Ph.D. prelim test, 2004.Jinyu Li’s major research interests cover several topics in speech recognition, including discriminative training, noise robustness, attribute detector design and spoken language recognition. In these areas, he has already published 16 papers in leading international conferences and journals. He is the winner of the best student paper of Interspeech, 2006.