Abstract

Indexing the spoken content of audio recordings requires automatic speech recognition, which is as of today not reliable. Unlike indexing text, we cannot reliably know from a speech recognizer whether a word is present at a given point in the audio; we can only obtain a probability for it. Correct use of these probabilities significantly improves spoken-document search accuracy.

In this paper, we will first describe how to improve accuracy for “web-search style” (AND/phrase) queries into audio, by utilizing speech recognition alternates and word posterior probabilities based on word lattices.

Then, we will present an end-to-end approach to doing so using standard text indexers, which by design cannot handle probabilities and unaligned alternates. We present a sequence of approximations that transform the numeric lattice-matching problem into a symbolic text-based one that can be implemented by a commercial full-text indexer.

Experiments on a 170-hour lecture set show an accuracy improvement by 30-60% for phrase searches and by 130% for two-term AND queries, compared to indexing linear text.