The mission of the Cognitive Services Research group (CSR) is to make fundamental contributions to advancing the state of the art of the most challenging problems in speech, language, and vision—both within Microsoft and the external research community.
We conduct cutting edge research in all aspects of spoken language processing and computer vision. This includes audio-visual fusion; visual-semantic reasoning; federated learning; speech recognition; speech enhancement; speaker recognition and diarization; machine reading comprehension; text summarization; multilingual language modeling; and related topics in natural language processing, understanding, and generation; as well as face forgery detection; object detection and segmentation; dense pose, head, and mask tracking, action recognition; image and video captioning; and other topics in image and real-time video understanding. We leverage large-scale GPU and CPU clusters as well as internal and public data sets to develop world-leading deep learning technologies for forward-looking topics such as audio-visual far-field meeting transcription, automatic meeting minutes generation, and multi-modal dialog systems. We publish our research on public benchmarks, such as our breakthrough human parity performances on the Switchboard conversational speech recognition task and Stanford’s Conversational Question Answering Challenge (CoQA).
In addition to expanding our scientific understanding of speech, language, and vision, our work finds outlets in Microsoft products such as Azure Cognitive Services, HoloLens, Teams, Windows, Office, Bing, Cortana, Skype Translator, Xbox, and more.
The Cognitive Services Research group is managed by Michael Zeng.
For more information on our vision research or recent progress leveraging knowledge and language, please see the pages for our Computer Vision and Knowledge and Language teams.