How to Train a Discriminative Front End with Stochastic Gradient Descent and Maximum Mutual Information

  • Jasha Droppo ,
  • Milind Mahajan ,
  • Asela Gunawardana ,
  • Alex Acero

Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding |

Published by Institute of Electrical and Electronics Engineers, Inc.

This paper presents a general discriminative training method for the front end of an automatic speech recognition system. The SPLICE parameters of the front end are trained using stochastic gradient descent (SGD) of a maximum mutual information (MMI) objective function. SPLICE is chosen for its ability to approximate both linear and non-linear transformations of the feature space. SGD is chosen for its simplicity of implementation. Results are presented on both the Aurora 2 small vocabulary task and the WSJ Nov-92 medium vocabulary task. It is shown that the discriminative front end is able to consistently increase system accuracy across different front end configurations and tasks.