Azure Cognitive Services Research

The mission of the Azure Cognitive Services Research group (CSR) is to make fundamental contributions to advancing the state of the art of the most challenging problems in speech, language, and vision—both within Microsoft and the external research community. The CSR includes Computer Vision, Knowledge and Language, and Speech teams.

We conduct cutting edge research in all aspects of spoken language processing and computer vision. This includes audio-visual fusion; visual-semantic reasoning; federated learning; speech recognition; speech enhancement; speaker recognition and diarization; machine reading comprehension; text summarization; multilingual language modeling; and related topics in natural language processing, understanding, and generation; as well as face forgery detection; object detection and segmentation; dense pose, head, and mask tracking, action recognition; image and video captioning; and other topics in image and real-time video understanding. We leverage large-scale GPU and CPU clusters as well as internal and public data sets to develop world-leading deep learning technologies for forward-looking topics such as audio-visual far-field meeting transcription, automatic meeting minutes generation, and multi-modal dialog systems. We publish our research on public benchmarks, such as our breakthrough human parity performances on the Switchboard conversational speech recognition task, CommonenseQA (opens in new tab) and Stanford’s Conversational Question Answering Challenge (CoQA).

In addition to expanding our scientific understanding of speech, language, and vision, our work finds outlets in Microsoft products such as Azure Cognitive Services (opens in new tab), HoloLens, Teams, Windows, Office, Bing, Cortana, Skype Translator, Xbox, and more.

The Azure Cognitive Services Research group is managed by Michael Zeng.