Online Vocabulary Adaptation using Limited Adaptation Data

Chang-E Liu; Kit Thambiratnam; Frank Seide

Online Vocabulary Adaptation using Limited Adaptation Data

Chang-E Liu ,
Kit Thambiratnam ,
Frank Seide

Proc. Interspeech | January 2007

Published by IEEE

Download BibTex

This paper presents a study of low-latency domain-independent online vocabulary adaptation using limited amounts of supporting text data. The target applications include blind indexing of Internet content, indexing of new content with low latency, and domains where Out-Of-Vocabulary (OOV) words are problematic. A number of methods to perform document-speciﬁc adaptation using a small amount of support metadata and the Internet are examined. It is shown that a combination of word feature fusion and cross-ﬁle statistics pooling provides robust adaptation. The best evaluated method achieved an absolute reduction of 27.6% in OOV detection false alarm rate over the baseline word feature thresholding methods.

© 2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.http://www.ieee.org/