New Directions in Robust Automatic Speech Recognition
- Richard Stern | Carnegie Mellon University
As speech recognition technology is transferred from the laboratory to the marketplace, robustness in recognition is becoming increasingly important. This talk will review and discuss several classical and contemporary approaches to robust speech recognition.
The most tractable types of environmental degradation are produced by quasi-stationary additive noise and quasi-stationary linear filtering. These distortions can be largely ameliorated by the “classical” techniques of cepstral high-pass filtering (as exemplified by cepstral mean normalization and RASTA filtering), as well as by techniques that develop statistical models of the distortion (such as codeword-dependent cepstral normalization and vector Taylor series expansion). Nevertheless, these types of approaches fail to provide much useful improvement when speech is degraded by transient or non-stationary noise such as background music or speech. We describe and compare the effectiveness of techniques based on missing-feature compensation, multi-band analysis, feature combination, and physiologically-motivated auditory scene analysis toward providing increased recognition accuracy in difficult acoustical environments.
Speaker Details
Richard M. Stern received the S.B. degree from the Massachusetts Institute of Technology in 1970, the M.S. from the University of California, Berkeley, in 1972, and the Ph.D. from MIT in 1977, all in electrical engineering. He has been on the faculty of Carnegie Mellon University since 1977, where he is currently a Professor in the Electrical and Computer Engineering, Computer Science, and Biomedical Engineering Departments, and the Language Technologies Institute. Much of Dr. Stern’s current research is in spoken language systems, where he is particularly concerned with the development of techniques with which automatic speech recognition can be made more robust with respect to changes in environment and acoustical ambience. He has also developed sentence parsing and speaker adaptation algorithms for earlier CMU speech systems. In addition to his work in speech recognition, Dr. Stern also maintains an active research program in psychoacoustics, where he is best known for theoretical work in binaural perception. Dr. Stern is a member of the IEEE and the Acoustical Society of America, and he was a recipient of the Allen Newell Award for Research Excellence in 1992.
-
-
Jeff Running
-
-
Watch Next
-
-
-
Accelerating MRI image reconstruction with Tyger
- Karen Easterbrook,
- Ilyana Rosenberg
-
-
-
-
-
-
-