Abstract

In this paper we describe two families of algorithms for hands-free speech recognition using microphone arrays. Enhancement-based approaches use a cascade of independent processing blocks to perform speech enhancement followed by speech recognition. We discuss the reasons why this approach may be sub-optimal and motivate the need for a solution that tightly integrates all processing blocks into a common unified framework. This leads to a second family of algorithms called unified approaches which considers all processing stages to be components of a single system that operates with the common goal of improved recognition accuracy. We describe several examples of such algorithms that have been shown to outperform more traditional signal-processing-based approaches. In doing so, we hope to convey the benefits of performing hands-free speech recognition in this manner and motivate further research in this area.