Online Vocabulary Adaptation using Limited Adaptation Data
- Chang-E Liu ,
- Kit Thambiratnam ,
- Frank Seide
Proc. Interspeech |
Published by IEEE
This paper presents a study of low-latency domain-independent online vocabulary adaptation using limited amounts of supporting text data. The target applications include blind indexing of Internet content, indexing of new content with low latency, and domains where Out-Of-Vocabulary (OOV) words are problematic. A number of methods to perform document-specific adaptation using a small amount of support metadata and the Internet are examined. It is shown that a combination of word feature fusion and cross-file statistics pooling provides robust adaptation. The best evaluated method achieved an absolute reduction of 27.6% in OOV detection false alarm rate over the baseline word feature thresholding methods.
© 2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.http://www.ieee.org/