Dynamically configurable acoustic models for speech recognition
- Mei-Yuh Hwang ,
- Xuedong Huang
Proc. of the IEEE International Conference on Acoustics, Speech and Signal Processing |
Published by Institute of Electrical and Electronics Engineers, Inc.
Senones were introduced to share Hidden Markov model (HMM) parameters at a sub-phonetic level in [3] and decision trees were incorporated to predict unseen phonetic contexts in [4]. In this paper, we will describe two applications of the senonic decision tree in (1) dynamically downsizing a speech recognition system for small platforms and in (2) sharing the Gaussian covariances of continuous density HIvIMs (CHMMs). We experimented how to balance different parameters that can offer the best trade off between recognition accuracy and system size. The dynamically downsized system, without retraining, performed even better than the regular Baum-Welch 1.11 trained system. The shared covariance model provided as good ;I performance as the unshared full model and thus gave us the freedom to increase the number of Gaussian means to increase the accuracy of the model. Combining the downsizing and covarian1:e sharing algorithms, a total of 8% error reduction was achieved over the Baum-Welch trained system with approximately the same parameter size.
© 2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.