Speech and Dialog Research Group

Speech and Dialog Research Group




Show Previous projects

Current Projects



Microsoft Research blog


Research in speech recognition, language modeling, language understanding, spoken language systems and multi-modal dialog systems.


The mission of the Speech and Dialog Research Group (SDRG) is to make fundamental contributions to advancing the state of the art in speech and language technology both within Microsoft and the external research community.

We conduct cutting edge research in all aspects of spoken language processing, including Speech Recognition;  Speech Enhancement; Speaker Recognition, Verification, and Diarization; Audio-Visual Fusion; Machine Reading Comprehension; Text Summarization; Language Modeling; Dialog; and related topics in Natural Language Processing, Understanding, and Generation. We leverage large-scale GPU and CPU clusters as well as internal and public data sets to develop world-leading deep learning technologies for forward-looking topics such as audio-visual far-field meeting transcription, automatic meeting minutes generation, and multimodal dialog systems. We publish our research on public benchmarks, such as our breakthrough human parity performances on the Switchboard conversational speech recognition task and Stanford’s Conversational Question Answering Challenge (CoQA).

In addition to expanding our scientific understanding of speech and natural language processing, our work finds outlets in Microsoft products such as HoloLens, Azure, Windows, Office, Bing, Cortana, Skype Translator, Xbox, and Azure Cognitive Services.

The Speech & Dialog Research Group is managed by Michael Zeng.


Former Members

Current Members


Microsoft pushes ahead with conversation transcription virtual microphone arrays

Microsoft demonstrated some interesting advancements on the smart-meetings front this week during its Build 2019 keynote. Company officials showed off a new Conversation Transcription capability that’s part of its Azure Speech Service. The new capability, now in preview, allows real-time transcription of multi-user conversations…

ZDNet | May 10, 2019

Microsoft’s Conversation Transcription demo wows as new hardware revealed

Microsoft has figured out real-time conversation transcription, revealing a new Azure-integrated conical reference design speaker along with a way to turn every phone and laptop in a meeting into an ad-hoc voice recognition array. The Build 2019 demo highlighted how a combination of edge devices and cloud processing could better work in harmony…

SlashGear | May 6, 2019