Abstract

Speech enhancement and recognition in noisy, reverberant conditions is a challenging open problem. We present a new approach to this problem, which is developed in the framework of probabilistic modeling. Our approach incorporates information about the statistical structure of speech signals using a speech model, which is pre-trained on a large dataset of clean speech. The speech model is a component in a larger model describing the observed sensor signals. That model is parametrized by the coefficients of the reverberation filters and the spectra of the sensor noise. We develop an EM algorithm that estimates those parameters from data and constructs a Bayes optimal estimator of the original speech signal.
‚Äč