Abstract

In this paper, a method is developed to employ vowel duration properties in a hidden Markov model (HMM)‐based large vocabulary speaker trained recognition system. It is found that each of the vowel phonemes spoken in isolated words can be divided into three allophones, each corresponding to a largely distinctive range of vowel durations. Such a division is based upon the phonetic context where the vowel occurs. In order to incorporate the durational information, each vowel’s HMM is trained using a maximum‐likelihood method with three separate sets of transition probabilities, corresponding to the three allophones. The output distributions of the HMM are assumed to be the same for all three allophones and trained jointly, to make best use of the limited number of available training tokens. The duration‐specific HMMs for vowel allophones have been evaluated in isolated word recognition experiments for two male speakers. The results show that the performance of the recognizer is improved, reducing the error rate by approximately 14% compared with recognition results without the use of the vowel durational models. The performance improvement resulting from use of the vowel durational models is due to reduction of postvocalic consonant errors arising from their contextual correlation with vowels of different durations, as well as to improved discrimination between vowel phonemes.