Online collaboration with real-time communication is becoming increasingly more important. Multimedia Computing and AI technologies are the two pillars of the strategy to achieve ultimate user experiences in a real-time online collaboration. In media computing group, we have been investing on these two strategic areas for two decades and have been working closely with our business groups to deliver real-time, intelligent, and immersive media experiences to customers. Our long-term vision is to advance multimedia technologies to help the evolution of computing technologies, and vice versa. The current research areas in media computing group include computer vision, audio and speech, media compression, and real-time communication.
News (7/18/2020): We’re hiring! For anyone who is interested in a Researcher or RSDE position, please feel free to contact us. (yanlu at microsoft)
Computer Vision – We conduct our research on computer vision mainly in three areas: scene understanding, visual recognition, and visual media manipulation. Specifically, we tackle fundamental problems and promote applications including 2D/3D scene parsing, 3D reconstruction, 2D/3D object detection, video classification, video object segmentation, multi-view correspondence learning, and video enhancement and retouching. We sustain excellence in the academic area and also contribute our advanced techniques to Microsoft products, such as video background blur/replacement and together mode for virtual group meeting in Microsoft Teams.
Audio and Speech – Our research aims to provide real-time and intelligent audio and speech technologies for real-world applications. We’re rolling out a new feature of blur for voice for Microsoft Teams, where using deep speech enhancement we can eliminate distracting background noises. We also apply AI to fill in the voice gaps to make it sound like a steady stream of conversation. Other related research topics include AI-based echo cancellation, speech super-resolution, speech recovery, and quality control in a real-time audio pipeline.
Media Compression – We develop advanced media compression technologies for image, video, and graphics. One of the big things is the screen codec (a.k.a. Titanium) that we have developed to improve user experiences of screen sharing for various Microsoft products. Our group is also an active contributor to video coding standards such as MPEG-4, H.264/AVC and H.265/HEVC. Our current focus is to develop the AI-powered media compression framework.
AI-based RTC Optimization – Many rules and codes developed with traditional system approach may result in suboptimal performance. The latest advances in AI could be leveraged to replace these rules with models trained from the real-world data. Reinforcement-learning-based RTC optimization is a new paradigm shift of AI-based software engineering. Our research aims to advance AI to optimize the quality and reduce the latency of audio, video, and screen sharing.
Partner Research Manager