How to Train a Discriminative Front End with Stochastic Gradient Descent and Maximum Mutual Information

Jasha Droppo; Milind Mahajan; Asela Gunawardana; Alex Acero

How to Train a Discriminative Front End with Stochastic Gradient Descent and Maximum Mutual Information

Jasha Droppo ,
Milind Mahajan ,
Asela Gunawardana ,
Alex Acero

Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding | December 2005

Published by Institute of Electrical and Electronics Engineers, Inc.

Download BibTex

This paper presents a general discriminative training method for the front end of an automatic speech recognition system. The SPLICE parameters of the front end are trained using stochastic gradient descent (SGD) of a maximum mutual information (MMI) objective function. SPLICE is chosen for its ability to approximate both linear and non-linear transformations of the feature space. SGD is chosen for its simplicity of implementation. Results are presented on both the Aurora 2 small vocabulary task and the WSJ Nov-92 medium vocabulary task. It is shown that the discriminative front end is able to consistently increase system accuracy across different front end configurations and tasks.

© 2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.