Handling Phonetic Context and Speaker Variation in a Structure-Based Speech Recognizer

Dong Yu; Li Deng

Handling Phonetic Context and Speaker Variation in a Structure-Based Speech Recognizer

Dong Yu ,
Li Deng

Proc. Interspeech | August 2007

Published by International Speech Communication Association

Download BibTex

Recently we have developed a novel type of structure-based speech recognizer, which uses parameterized, non-recursive ??hidden?? trajectory model of vocal tract resonances (VTR) or formants to capture the dynamic structure of long-range speech coarticulation and reduction. The underlying model of this recognizer carries out bi-directional FIR filtering on the piecewise constant sequences of the VTR targets. In this paper, we elaborate on two key aspects of the model. First, the phonetic context controls the movement direction and thus the formation of the VTR trajectories. This provides ??structured?? context dependency for speech acoustics without using context dependent parameters as required by HMMs. Second, VTR targets as the key context-independent parameters of the model vary across speakers. We describe an effective target-value normalization algorithm that can be applied to both training and unknown test speakers. We report experimental results demonstrating the effectiveness of the normalization algorithm in the context of structure-based speech recognition. We also provide computational analysis on the HTM-based speech decoder.

© 2007 ISCA. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the ISCA and/or the author.