Abstract

An overview of a statistical paradigm for speech recognition is given where phonetic and phonological knowledge sources are seamlessly integrated into the structure of a speech model. A unifying computational formalism is outlined in which the sub-models for the discrete, feature-based phonological and the continuous, dynamic phonetic processes in human speech production are computationally interfaced, enabling global optimization of the model parameter sets that economically characterize distinct sources of speech variabilities. The formalism is founded on a rigorous mathematical basis, and is developed to aim at overcoming key limitations of current speech recognition technology.