Azure Cognitive Services Research

The Speech Research Team is part of the Azure AI Cognitive Services Research (CSR) group and is responsible for fundamental advances in audio, speech, and spoken language processing technologies. We also work closely with engineering and product teams to bring the new technologies into Microsoft products.

We work on a wide range of speech processing problems, including speech enhancement, speech recognition, speaker diarization, multi-lingual speech recognition, spoken language understanding, end-to-end modeling and self-supervised learning. Our recent work covers the following topics.

Deep learning-based real-time speech enhancement
Monaural and multi-channel speech separation for meeting transcription
Ad hoc microphone arrays
End-to-end modeling for speaker-attributed speech recognition
Unified speech representation learning
Speech-language pre-training

The results of our work are delivered to Microsoft speech technologies and interwoven into various products. We also contributed to the development of new services, such as Conversation Transcription (opens in new tab) of Azure Cognitive Services which is powering the transcription features of several Microsoft products. Our work resulted in the first place in the speaker diarization track of VoxSRC-20 (opens in new tab) (joint work with other Microsoft researchers) and the breakthrough human parity performance on the Switchboard conversational speech recognition task.

The former Speech and Dialog Research Group (SDRG) was merged with the Azure Computer Vision Group in 2020 to form the Cognitive Services Research Group.