Online collaboration with real-time communication is becoming increasingly more important. Multimedia Computing and AI technologies are the two pillars of the strategy to achieve ultimate user experiences in a real-time online collaboration. In media computing group, we have been investing on these two strategic areas for two decades and have been working closely with our business groups to deliver real-time, intelligent, and immersive media experiences to customers. Our long-term vision is to advance multimedia technologies to help the evolution of computing technologies, and vice versa. The current research areas in media computing group include computer vision, audio and speech, media compression, and real-time communication.

News (7/18/2020): We’re hiring! For anyone who is interested in a Researcher or RSDE position, please feel free to contact us. (yanlu at microsoft)

Research Topics

Computer Vision – We conduct our research on computer vision mainly in three areas: scene understanding, visual recognition, and visual media manipulation. Specifically, we tackle fundamental problems and promote applications including 2D/3D scene parsing, 3D reconstruction, 2D/3D object detection, video classification, video object segmentation, multi-view correspondence learning, and video enhancement and retouching. We sustain excellence in the academic area and also contribute our advanced techniques to Microsoft products, such as video background blur/replacement and together mode for virtual group meeting in Microsoft Teams.

Audio and Speech – Our research aims to provide real-time and intelligent audio and speech technologies for real-world applications. We’re rolling out a new feature of blur for voice for Microsoft Teams, where using deep speech enhancement we can eliminate distracting background noises. We also apply AI to fill in the voice gaps to make it sound like a steady stream of conversation. Other related research topics include AI-based echo cancellation, speech super-resolution, speech recovery, and quality control in a real-time audio pipeline.

Media Compression – We develop advanced media compression technologies for image, video, and graphics. One of the big things is the screen codec (a.k.a. Titanium) that we have developed to improve user experiences of screen sharing for various Microsoft products. Our group is also an active contributor to video coding standards such as MPEG-4, H.264/AVC and H.265/HEVC. Our current focus is to develop the AI-powered media compression framework.

AI-based RTC Optimization – Many rules and codes developed with traditional system approach may result in suboptimal performance. The latest advances in AI could be leveraged to replace these rules with models trained from the real-world data. Reinforcement-learning-based RTC optimization is a new paradigm shift of AI-based software engineering. Our research aims to advance AI to optimize the quality and reduce the latency of audio, video, and screen sharing.



Portrait of Yan Lu

Yan Lu

Partner Research Manager

Portrait of Bin Li

Bin Li

Senior Researcher

Portrait of Huaying Xue

Huaying Xue


Portrait of Jiahao Li

Jiahao Li

Senior Researcher

Portrait of Jingjing Fu

Jingjing Fu


Portrait of Jinglu Wang

Jinglu Wang

Senior Researcher

Portrait of Lei Chu

Lei Chu


Portrait of Xiang Ming

Xiang Ming


Portrait of Xiangyu Kong

Xiangyu Kong


Portrait of Xiao Li

Xiao Li

Senior Researcher

Portrait of Xiulian Peng

Xiulian Peng

Principle Researcher

Portrait of Xun Guo

Xun Guo

Principal Researcher

Portrait of Yuan Zhou

Yuan Zhou


Portrait of Yue Gao

Yue Gao