Abstract

We recently proposed a method for HMM adaptation to noisy
environments called Linear Spline Interpolation (LSI). LSI uses
linear spline regression to model the relationship between clean
and noisy speech features. In the original algorithm, stereo
training data was used to learn the spline parameters that min-
imize the error between the predicted and actual noisy speech
features. The estimated splines are then used at runtime to adapt
the clean HMMs to the current environment. While good results
can be obtained with this approach, the performance is limited
by the fact that the splines are trained independently from the
speech recognizer and as such, they may actually be subopti-
mal for adaptation. In this work, we introduce a new General-
ized EM algorithm for estimating the spline parameters using
the speech recognizer itself. Experiments on the Aurora 2 task
show that using LSI adaptation with splines trained in this man-
ner results in a 20% improvement over the original LSI algo-
rithm that used splines estimated from stereo data and a 28%
improvement over VTS adaptation.

‚Äč