Spatial Audio Rendering for Speech Live Translation

August 14, 2025
Margarita Geleta, UC Berkeley

Language barriers in virtual meetings remain a persistent challenge to global collaboration. While real-time translation technologies offer a promising solution, their integration into conversational interfaces often neglects key perceptual cues. This study explores how spatial audio rendering of translated speech affects comprehension, cognitive load, and user experience in multilingual teleconferencing. We conducted a within-subjects experiment involving 8 confederates (speakers) and 47 participants (listeners) simulating global team meetings, using Wizard-of-Oz live English translations of conversations in Greek, Kannada, Mandarin Chinese, and Ukrainian—languages selected for their diversity in grammar, script, and resource availability. Participants experienced four audio conditions for the translated speech: spatial audio (aligned with the speaker’s on-screen location) with and without background reverberation, and two non-spatial configurations (diotic and monaural). We measured listener comprehension accuracy, NASA-TLX workload ratings, and satisfaction Likert scores, complemented by qualitative feedback.

Results show that participants listening to spatially-rendered translated speech were more than twice as likely to comprehend compared to non-spatial audio, and experienced a reduction in perceived listening effort of approximately 2.4%. Participants also reported greater clarity and engagement when spatial cues and voice timbre differentiation were preserved. We discuss design implications for integrating real-time translation into virtual meeting platforms, offering guidelines for delivering translated speech in ways that minimize cognitive load and improve conversational clarity. These findings advance best practices for inclusive, cross-language communication in telepresence systems.

- Margarita Geleta
  
  Research Affiliate
  
  UC Berkeley
연구분야
- Audio and Acoustics

다음 볼만한 동영상

Distant conversational speech recognition: Challenges and Opportunities
October 15, 2025
Dr. Samuele Cornell,

Sunit Sivasankaran
FOA Tokenizer: Learning Discrete Representations of Spatial Audio with Multichannel VQ-GAN
September 3, 2025
Parthasaarathy Sudarsanam,

Hannes Gamper

Spatial Audio Rendering for Speech Live Translation

Margarita Geleta

연구분야

다음 볼만한 동영상

Distant conversational speech recognition: Challenges and Opportunities

FOA Tokenizer: Learning Discrete Representations of Spatial Audio with Multichannel VQ-GAN