Research Intern – Interactive Multimodal Futures Group (Situated & Affective Computing)
The Interactive Multimodal Futures (IMF) group at Microsoft Research seeks a PhD-level Research Intern to work on a project at the intersection of situated interaction, affective computing, and human-centered AI systems. The project will include…
Spatial Audio Rendering for Speech Live Translation
Language barriers in virtual meetings remain a persistent challenge to global collaboration. While real-time translation technologies offer a promising solution, their integration into conversational interfaces often neglects key perceptual cues. This study explores how spatial…
Research Intern – Microsoft CoreAI Speech
The CoreAI Speech Group is on a mission to develop the core speech technologies that empower millions of users to achieve more. We are seeking Research Interns to contribute to pioneering research in speech and audio.…
Distant conversational speech recognition: Challenges and Opportunities
State-of-the-art ASR systems excel on close-talk benchmarks but struggle with far-field conversational speech, where error rates remain above 20%. Current benchmark datasets inadequately assess generalization across domains and real-world conditions, often relying on oracle segmentation…
FOA Tokenizer: Learning Discrete Representations of Spatial Audio with Multichannel VQ-GAN
Spatial audio captures the directional and environmental characteristics of sound, enabling immersive listening experiences. First-Order Ambisonics (FOA) provides a compact representation of spatial audio by encoding the sound field’s directional components across four channels, allowing…