Abstract

We study lattice rescoring with knowledge scores for automatic
speech recognition. Frame-based log likelihood ratio is adopted as
a score measure of the goodness-of-fit between a speech segment
and the knowledge sources. We evaluate our approach in two different
applications: phone recognition, and connected digit continuous
recognition. By incorporating knowledge scores obtained
from 15 attribute detectors for place and manner of articulation,
we reduced phone error rate from 40.52% to 35.16% using monophone
models. The error rate can be further reduced to 33.42% for
triphone models. The same lattice rescoring algorithm is extended
to connected digit recognition using the TIDIGITS database, and
without using any digit-specific training data. We observed the
digit error rate can be effectively reduced to 4.03% from 4.54%
which was obtained with the conventional Viterbi decoding algorithm
with no knowledge scores.

‚Äč