Abstract

The paper presents the Position Specific
Posterior Lattice, a novel representation
of automatic speech recognition lattices
that naturally lends itself to efficient indexing
of position information and subsequent
relevance ranking of spoken documents
using proximity.
In experiments performed on a collection
of lecture recordings — MIT iCampus
data — the spoken document ranking
accuracy was improved by 20% relative
over the commonly used baseline of
indexing the 1-best output from an automatic
speech recognizer. The Mean Average
Precision (MAP) increased from 0.53
when using 1-best output to 0.62 when using
the new lattice representation. The reference
used for evaluation is the output of
a standard retrieval engine working on the
manual transcription of the speech collection.
Albeit lossy, the PSPL lattice is also much
more compact than the ASR 3-gram lattice
from which it is computed — which
translates in reduced inverted index size
as well — at virtually no degradation in
word-error-rate performance. Since new
paths are introduced in the lattice, the ORACLE
accuracy increases over the original
ASR lattice.