Distant conversational speech recognition: Challenges and Opportunities
- Dr. Samuele Cornell, Carnegie Mellon University; Sunit Sivasankaran, Microsoft
State-of-the-art ASR systems excel on close-talk benchmarks but struggle with far-field conversational speech, where error rates remain above 20%. Current benchmark datasets inadequately assess generalization across domains and real-world conditions, often relying on oracle segmentation that yields overly optimistic results. Distant ASR (DASR) faces unique challenges including overlapping speech, varied recording setups, and dynamic speaker interactions that significantly complicate system development. Despite these difficulties, spontaneous conversational speech represents the next frontier for developing more human-like AI agents capable of natural multi-party communication. This talk presents recent advances in DASR through three interconnected efforts: (1) the CHiME-7 and CHiME-8 DASR challenges, which established rigorous benchmarks for generalizable robust meeting transcription, (2) end-to-end joint modeling that unifies speaker diarization and speech recognition into a single framework, moving beyond traditional pipeline approaches, and (3) synthetic data generation leveraging large language models and text-to-speech systems to create realistic multi-speaker training data at scale.
-
-
Dr. Samuele Cornell
Postdoctoral Research Associate
Carnegie Mellon University
-
Sunit Sivasankaran
Applied Scientist
-
-
Watch Next
-
Evaluating the Cultural Relevance of AI Models and Products: Learnings on Maternal Health ASR, Data Augmentation and User Testing Methods
- Oche Ankeli,
- Ertony Bashil,
- Dhananjay Balakrishnan
-
-
-
Accelerating Multilingual RAG Systems
- Nandan Thakur
-
-
-
-
-
MSR Talk: Unsupervised Speech Reverberation Control with Diffusion Implicit Bridges
- Eloi Moliner,
- Hannes Gamper
-