Word-Lattice Based Spoken-Document Indexing with Standard Text Indexers
- Frank Seide ,
- Kit Thambiratnam ,
- Roger (Peng) Yu
Proc. IEEE Workshop on Spoken Language Technology (SLT) |
Published by IEEE
Indexing the spoken content of audio recordings requires automatic speech recognition, which is as of today not reliable. Unlike indexing text, we cannot reliably know from a speech recognizer whether a word is present at a given point in the audio; we can only obtain a probability for it. Correct use of these probabilities significantly improves spoken-document search accuracy.
In this paper, we will first describe how to improve accuracy for “web-search style” (AND/phrase) queries into audio, by utilizing speech recognition alternates and word posterior probabilities based on word lattices.
Then, we will present an end-to-end approach to doing so using standard text indexers, which by design cannot handle probabilities and unaligned alternates. We present a sequence of approximations that transform the numeric lattice-matching problem into a symbolic text-based one that can be implemented by a commercial full-text indexer.
Experiments on a 170-hour lecture set show an accuracy improvement by 30-60% for phrase searches and by 130% for two-term AND queries, compared to indexing linear text.
© 2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.http://www.ieee.org/