Abstract

This paper presents a general discriminative training method for the front end of an automatic speech recognition system. The SPLICE parameters of the front end are trained using stochastic gradient descent (SGD) of a maximum mutual information (MMI) objective function. SPLICE is chosen for its ability to approximate both linear and non-linear transformations of the feature space. SGD is chosen for its simplicity of implementation. Results are presented on both the Aurora 2 small vocabulary task and the WSJ Nov-92 medium vocabulary task. It is shown that the discriminative front end is able to consistently increase system accuracy across different front end configurations and tasks.