Abstract

In this chapter we have reviewed a number of techniques that individually and collectively provide substantial reduction of speech recognition error rates in difficult acoustical environments including unknown additive noise and/or unknown linear filtering. We compared empirically-derived and structurally-based approaches to acoustical pre-processing. Empirical compensation approaches are quite easy to implement, but they require prior access to examples of simultaneously-recorded speech in the training and testing domains. Model-based compensation procedures require a valid parametric characterization of the testing environment, but they do not require access to \stereo” databases. The performance of model-based compensation procedures also converges more rapidly in new testing environments. Finally, cepstral high-pass filtering procedures provide substantial robustness at almost zero cost, and are recommended universally. We also note that the use of microphone arrays can provide a further improvement in recognition accuracy that is complementary to the benefit provided by acoustical pre-processing techniques. Finally, we also discuss several issues concerning the use of signal processing algorithm based on models of the human auditory periphery, which so far have not yet provided substantial quantitative reductions in recognition error rate.