Unsupervised Learning from Users’ Error Correction in Speech Dictation

Mei-Yuh Hwang; Dong Yu; Alex Acero; Li Deng

Unsupervised Learning from Users’ Error Correction in Speech Dictation

Mei-Yuh Hwang ,
Dong Yu ,
Alex Acero ,
Li Deng

Proc. Int. Conf. on Spoken Language Processing | October 2004

Published by International Speech Communication Association

Download BibTex

We propose an approach to adapting automatic speech recognition systems used in dictation systems through unsupervised learning from users’ error correction. Three steps are involved in the adaptation: 1) infer whether the user is correcting a speech recognition error or simply editing the text, 2) infer what the most possible cause of the error is, and 3) adapt the system accordingly. To adapt the system effectively, we introduce an enhanced two-pass pronunciation learning algorithm that utilizes the output from both an ngram phoneme recognizer and a Letter-to-Sound component. Our experiments show that we can obtain greater than 10% relative word error rate reduction using the approaches we proposed. Learning new words gives the largest performance gain while adapting pronunciations and using a cache language model also produce a small gain.

© 2007 ISCA. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the ISCA and/or the author.