Abstract

A two-level mixture linear dynamic system model, with frame-level switching parameters in the obser10 vation equation and with segment-level switching parameters in the target-directed state equation, is de11 veloped and evaluated. The main contributions of this work are: (1) the new framework for dealing with 12 mixed-level switching in the dynamic system and (2) the novel use of piecewise linear functions, enabled by 13 the introduction of frame-level switching, to approximate the nonlinear function between the hidden vocal14 tract-resonance space and the observable acoustic space. The approximation is accomplished by the frame15 dependent switching parameters in the observation equation. In this paper, in a self-contained manner, we 16 highlight the key algorithm differences from the earlier model having only single segment-level switching 17 that is synchronous between the state and observation equations. A series of speech recognition experi18 ments are carried out to evaluate this new model using a subset of Switchboard conversational speech data. 19 The experimental results show that the approximation accuracy is improved with an increased number of 20 switching-parameter values. The speech recognizer built from the new mixed-level switching dynamic 21 system model using an N-best re-scoring evaluation paradigm show moderate word error rate reduction 22 compared with using either single-level switching or no switching parameters.

‚Äč