Graphics and multimedia - Microsoft Research

Publication

InfoAlign: A Human-AI Co-Creation System for Storytelling with Infographics

Jielin Feng, Xinwu Ye, Qianhui Li, Verena Prantl, Jun-Hsiang Yao, Yuheng Zhao, Yun Wang, Siming Chen

CHI 2026 | February 2026

Publication

FilMaster: Bridging Cinematic Principles and Generative AI for Automated Film Generation

Kaiyi Huang, Yukun Huang, Xintao Wang, Zinan Lin, Xuefei Ning, Pengfei Wan, Di Zhang, Yu Wang, Xihui Liu

ICLR 2026 | February 2026

Publication

Long Video Understanding with Learnable Retrieval in Video-Language Models

Jiaqi Xu, Cuiling Lan, Wenxuan Xie, Xuejin Chen, Yan Lu

IEEE Transactions on Multimedia | February 2026

Career Opportunity

Research Intern – Interactive Multimodal Futures Group (Situated & Affective Computing)

Posted: December 2, 2025

Location: Cambridge, MA, US; Redmond, WA, US

Research Area(s): Artificial intelligence, Audio and Acoustics, Computer vision, Data platforms and analytics, Graphics and multimedia, Human-computer interaction

The Interactive Multimodal Futures (IMF) group at Microsoft Research seeks a PhD-level Research Intern to work on a project at the intersection of situated interaction, affective computing, and human-centered AI systems. The project will include…

Career Opportunity

Research Intern – Computer Vision and Deep Learning

Posted: November 4, 2025

Location: Redmond, WA, US

Research Area(s): Artificial intelligence, Computer vision, Graphics and multimedia

If you are interested in cutting edge ML, video and graphics and want to make the experiences of millions of personal computer (PC) users and developers more productive and enjoyable, please join us at Microsoft…

Publication

Agentic Media: Reimagining the Future of Communication

Yun Wang, Yan Lu

MSR-TR-2025-62 | October 2025

Published by Microsoft

Project

Publication

DAViD: Data-efficient and Accurate Vision Models from Synthetic Data

Fatemeh Saleh, Sadegh Aliakbarian, Charlie Hewitt, Lohit Petikam, Xiao-Xian, Antonio Criminisi, Tom Cashman, Tadas Baltrusaitis

2025 International Conference on Computer Vision | October 2025

Video

Publication

VoluMe – Authentic 3D Video Calls from Live Gaussian Splat Prediction

Marti de La Gorce, Charlie Hewitt, Tibor Takacs, Robert Gerdisch, Zafiirah Hosenie, Givi Meishvili, Marek Kowalski (HE/HIM), Tom Cashman, Antonio Criminisi

2025 International Conference on Computer Vision | October 2025

Video

Publication

Instruction Agent: Enhancing Agent with Expert Demonstration

Yinheng Li, Hailey Hultquist, Justin Wagle, Kazuhito Koishida

September 2025

Publication

Diffusion is a code repair operator and generator

Mukul Singh, Gust Verbruggen, Vu Le, Sumit Gulwani

ArXiv | August 2025, Vol abs/2508.11110