Cognitive Services Research

Cognitive Services Research


Show previous publications


Artificial Neural Network Features for Speaker Diarization

The relation of eye gaze and face pose: Potential impact on speech recognition

An Introduction to Computational Networks and the Computational Network Toolkit

Neural Network Models for Lexical Addressee Detection

Speech Emotion Recognition Using Deep Neural Network and Extreme Learning Machine

Highly Accurate Phonetic Segmentation Using Boundary Correction Models and System Fusion


Statistical Modeling of the Speech Signal

Dual stage probabilistic voice activity detector

Reverberated Speech Signal Separation Based on Regularized Subband Feedforward ICA and Instantaneous Direction of Arrival


Commute UX: Voice Enabled In-car Infotainment System

Unified Framework for Single Channel Speech Enhancement


Sound Capture System and Spatial Filter for Small Devices

Data Driven Beamformer Design for Binaural Headset

Robust Design of Wideband Loudspeaker Arrays

An EM-based Probabilistic Approach for Acoustic Echo Suppression


Commute UX: Telephone Dialog System for Location-based Services

Robust Location Understanding in Spoken Dialog Systems Using Intersections

Robust Adaptive Beamforming Algorithm Using Instantaneous Direction of Arrival with Enhanced Noise Suppression Capability

Microphone Array Post-Filter Using Incremental Bayes Learning to Track the Spatial Distribution of Speech and Noise


Microphone Array Post-Processor Using Instantaneous Direction of Arrival

Suppression Rule for Speech Recognition Friendly Noise Suppressors


A Compact Multi-Sensor Headset for Hands-Free Communication

Microphone Array for Headset with Spatial Noise Suppressor

Reverberation Reduction for Improved Speech Recognition

Reverberation Reduction for Better Speech Recognition

News & features

News & features

Show previous projects

Current Projects

News & features

News & features

News & features


The mission of the Cognitive Services Research group (CSR) is to make fundamental contributions to advancing the state of the art of the most challenging problems in speech, language, and vision—both within Microsoft and the external research community. The CSR includes Computer VisionKnowledge and Language , and Speech teams.

We conduct cutting edge research in all aspects of spoken language processing and computer vision. This includes audio-visual fusion; visual-semantic reasoning; federated learning; speech recognition; speech enhancement; speaker recognition and diarization; machine reading comprehension; text summarization; multilingual language modeling; and related topics in natural language processing, understanding, and generation; as well as face forgery detection; object detection and segmentation; dense pose, head, and mask tracking, action recognition; image and video captioning; and other topics in image and real-time video understanding. We leverage large-scale GPU and CPU clusters as well as internal and public data sets to develop world-leading deep learning technologies for forward-looking topics such as audio-visual far-field meeting transcription, automatic meeting minutes generation, and multi-modal dialog systems. We publish our research on public benchmarks, such as our breakthrough human parity performances on the Switchboard conversational speech recognition task and Stanford’s Conversational Question Answering Challenge (CoQA).

In addition to expanding our scientific understanding of speech, language, and vision, our work finds outlets in Microsoft products such as Azure Cognitive Services, HoloLens, Teams, Windows, Office, Bing, Cortana, Skype Translator, Xbox, and more.

The Cognitive Services Research group is managed by Michael Zeng.



Current members

Speech and Dialog alumni

Knowledge and Language

The Knowledge and Language Team is part of the Cognitive Services Research (CSR) group, focusing on cutting edge research and the development of the next generation framework for knowledge and natural language processing.

We are working on problems including, inter alia, knowledge-boosted language modeling; knowledge extraction; knowledge graph; summarization; language understanding and generation. We conduct large-scale pre-training and domain-specific fine-tuning on internal and public data sets to develop state-of-the-art deep learning technologies for core knowledge and language problems in various real applications.

Our work has resulted in multiple publications in top NLP conferences and first place submissions to the CommonsenseQA and FEVER leaderboards.

Our recent work covers:
• How to simultaneously pre-train knowledge graph and language model
• Increase factual correctness of abstractive summaries via knowledge graph
• Summarize multi-party meeting transcripts
• Utilize positional bias in news articles for zero-shot summarization

Computer Vision

Azure Computer Vision Research (ACVR) group is part of the Cognitive Services Research (CSR) group, focusing on cutting edge research in computer vision to advance the state of the art and develop the next generation framework for visual recognition. The problems that we are interested in include image classification; object detection and segmentation; motion analysis and object tracking; dense pose, head, and mask tracking, action recognition; image generation; real-time video understanding; visual representation learning; multi-modality representation learning; and unsupervised/self-supervised/contrastive learning. We leverage large-scale GPU and CPU clusters as well as internal and public data sets to develop world-leading deep learning technologies for core vision problems and generic visual representation that can be customized to a wide range of downstream tasks and real applications. The team also runs Project Florence, with a focus on developing universal backbones with shared representations for a wide spectrum of visual categories, aiming at accelerating Microsoft vision product shipping using state-of-the-art large-scale deep learning models.

Speech and Dialog

The former Speech and Dialog Research Group (SDRG) was responsible for fundamental advances in speech and language technologies, including speech recognition, language modeling, language understanding, spoken language systems and multi-modal dialog systems. Contributions included the breakthrough human parity performances on the Switchboard conversational speech recognition task and Stanford’s Conversational Question Answering Challenge (CoQA). SDRG merged with the Azure computer vision group in 2020 to form the Cognitive Services Research Group.

Former members


CSR organizes the Distinguished Talk Series to host discussions with leaders in academia and industry. If you’re interested in giving a talk, please contact Chenguang Zhu (






Prof. Mohit Bansal University of North Carolina TBD TBD
Dr. Jim Glass MIT 7/22/2021 Recent Progress in Self-Supervised and Cross-Modal Speech Processing
Prof. Zhiting Hu UCSD 6/17/2021 Text Generation with No (Good) Data: New Reinforcement Learning and Causal Frameworks
Prof. Nanyun Peng UCLA 5/27/2021 Controllable Text Generation Beyond Auto-regressive Models
Prof. Ashton Anderson University of Toronto 4/09/2021 The Cultural Structure of Online Platforms
Prof. Aditya Grover Facebook AI Research/UCLA 3/18/2021 Transformer Language Models as Universal Computation Engines
Prof. Diyi Yang Georgia Tech 2/18/2021 Language Understanding in Social Context: Theory and Practice
Prof. Song Han MIT 1/21/2021 Putting AI on a Diet: TinyML and Efficient Deep Learning
Prof. Tianqi Chen Carnegie Mellon University 1/15/2021 Elements of Learning Systems
Prof. Xiang Ren University of Southern California 12/18/2020 Label Efficient Learning with Human Explanations
Prof. Jiajun Wu Stanford University 11/19/2020 Neuro-Symbolic Visual Concept Learning
Prof. Fei Liu University of Central Florida 10/30/2020 Toward Robust Abstractive Multi-Document Summarization and Information Consolidation
Prof. Vivian Yun-Nung Chen National Taiwan University 10/2/2020 Are Your Dialogue Systems Robust and Scalable?
Prof. Meng Jiang University of Notre Dame 9/10/2020 Scientific Knowledge Extraction: New Tasks and Methods