Abstract

The accuracy of a speech recognition (SR) system depends on
many factors, such as the presence of background noise,
mismatches in microphone and language models, variations in
speaker, accent and even speaking rates. In addition to fast
speakers, even normal speakers will tend to speak faster when
using a speech recognition system in order to get higher
throughput. Unfortunately, state-of-the-art SR systems perform
significantly worse on fast speech. In this paper, we present
our efforts in making our system more robust to fast speech.
We propose cepstrum length normalization, applied to the
incoming testing utterances, which results in a 13% word error
rate reduction on an independent evaluation corpus. Moreover,
this improvement is additive to the contribution of Maximum
Likelihood Linear Regression (MLLR) adaptation. Together
with MLLR, a 23% error rate reduction was achieved.

‚Äč