Abstract

The performance of speech cleaning and noise adaptation
algorithms is heavily dependent on the quality of the noise
and channel models. Various strategies have been proposed
in the literature for adapting to the current noise and channel
conditions. In this paper, we describe the joint learning
of noise and channel distortion in a novel framework called
ALGONQUIN. The learning algorithm employs a generalized
EM strategy wherein the E step is approximate. We
discuss the characteristics of the new algorithm, with a focus
on convergence rates and parameter initialization. We
show that the learning algorithm can successfully disentangle
the non-linear effects of noise and linear effects of the
channel and achieve a relative reduction in WER of 21.8%
over the non-adaptive algorithm.

‚Äč