About
I am Yifan Yang, a Senior Research SDE at Microsoft Research Asia (MSRA) Shanghai site, where I joined in 2021. My work focuses on visual content generation, multimodal foundation models, and general-purpose agentic systems, bridging research innovation with product-level deployment. I have published 30+ peer-reviewed papers in top venues including CVPR, ICCV, ECCV, ICLR, NeurIPS, and AAAI, and have been deeply involved in the development of Microsoft’s Phi model family (e.g., Phi-3 and Phi-4). Several of my techniques have been transferred into core Microsoft products, including Office and Azure. Our recent work LLM2CLIP improves cross-modal representation learning by incorporating large language models, has been integrated into the Phi-4-mini pretraining pipeline, and received the AAAI 2026 Outstanding Paper Award.
Google Scholar (opens in new tab)
If you are interested in internship opportunities or research collaborations, feel free to reach out at
đź“§ yifanyang@microsoft.com
-
First-author and Corresponding-author Publications
(* denotes co-first author, †denotes corresponding author)
-
LLM2CLIP: Powerful Language Model Unlocks Richer Cross-Modality Representation
Weiquan Huang, Aoqi Wu, Yifan Yang†, et al.
AAAI 2026 — Outstanding Paper Award 🏆 -
HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models
Ziqin Zhou, Yifan Yang†, et al.
AAAI 2026 -
VidGuard-R1: AI-generated Video Detection and Explanation via Reasoning Multimodal Language Models and Reinforcement Learning
Kyoungjun Park, Yifan Yang†, et al.
ICLR 2026 -
Zoomer: Adaptive Image Focus Optimization for Black-box Multimodal Large Language Models
Jiaxu Qian, Chendong Wang, Yifan Yang†, et al.
Transactions on Machine Learning Research (TMLR), 2025 -
Attentive Mask CLIP
Yifan Yang, et al.
ICCV 2023 -
ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation
Yasheng Sun, Yifan Yang*, et al.
NeurIPS 2023 -
Directional Self-supervised Learning for Heavy Image Augmentations
Yalong Bai, Yifan Yang*, et al.
CVPR 2021 -
VoLUT: Efficient Volumetric Streaming Enhanced by LUT-based Super-resolution
Chendong Wang, Anlan Zhang, Yifan Yang†, et al.
MLSys 2025 -
LoRaSC: Expressive and Generalizable Low-rank Adaptation for Large Models via Slow Cascaded Learning
Siwei Li, Yifan Yang†**, et al.
EMNLP 2024 (Findings) -
VIGOR: Reviving Cloud Gaming Sessions
Zhaoyuan He, Yifan Yang*, et al.*
ACM CoNEXT 2024 -
Nerve: Real-time Neural Video Recovery and Enhancement on Mobile Devices
Zhaoyuan He, Yifan Yang*, et al.*
Proceedings of the ACM on Networking (CoNEXT), 2024
Technical Reports and Major Preprints (First / Corresponding Author)
-
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Marah Abdin, Sam Ade Jacobs, et al., Yifan Yang
arXiv Technical Report, 2024 -
Phi-4-mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Abdelrahman Abouelenin, Atabak Ashfaq, et al., Yifan Yang
arXiv Technical Report, 2025 -
ReasonGen-R1: Chain-of-Thought for Autoregressive Image Generation Models through Supervised Fine-tuning and Reinforcement Learning
Yu Zhang, Yunqi Li, Yifan Yang†, et al.
arXiv:2505.24875, under review at CVPR 2026 -
Video-in-the-loop: Span-grounded Long Video QA with Interleaved Reasoning
Chendong Wang, Donglin Bai, Yifan Yang†, et al.
Under review at ICLR 2026 -
Region-adaptive Sampling for Diffusion Transformers
Ziming Liu, Yifan Yang†, et al.
arXiv:2502.10389, under review at CVPR 2026 -
AdaNav: Adaptive Reasoning with Uncertainty for Vision-Language Navigation
Xin Ding, Jianyu Wei, Yifan Yang, et al.
arXiv:2509.24387, 2025 -
Diffusion²: Turning 3D Environments into Radio Frequency Heatmaps
Kyoungjun Park, Yifan Yang†, et al.
arXiv preprint, 2025
-
  Collaborative Publications
-
-
Unified Medical Image Pre-training in Language-Guided Common Semantic Space
Xiaoxuan He, Yifan Yang, et al.
ECCV 2024 -
StreamMind: Unlocking Full Frame-rate Streaming Video Dialogue through Event-gated Cognition
Xin Ding, Hao Wu, Yifan Yang, et al.
ICCV 2025 -
Efficient and Adaptive Diffusion Model Inference through Lookup Tables on Mobile Devices
Qipeng Wang, Shiqi Jiang, Yifan Yang, et al.
IEEE Transactions on Mobile Computing, 2025 -
Online Video Quality Enhancement with Spatial-Temporal Look-up Tables
Zefan Qu, Xinyang Jiang, Yifan Yang, et al.
ECCV 2025 -
ProLongVid: A Simple but Strong Baseline for Long-context Video Instruction Tuning
Rui Wang, Bohao Li, Yifan Yang, et al.
EMNLP 2025 -
MageBench: Bridging Large Multimodal Models to Agents
Miaosen Zhang, Qi Dai, Yifan Yang, et al.
WACV 2025 -
Reducio! Generating 1K Video within 16 Seconds Using Extremely Compressed Motion Latents
Rui Tian, Qi Dai, Yifan Yang, et al.
ICCV 2025 -
AdaNav: Adaptive Reasoning with Uncertainty for Vision-Language Navigation
Xin Ding, Jianyu Wei, Yifan Yang, et al.
arXiv preprint, 2025 -
Expand Heterogeneous Learning Systems with Selective Multi-Source Knowledge Fusion
Gengyuan Dai, Hongxu Xu, Yifan Yang, et al.
AAAI 2026 -
Empowering Agentic Video Analytics Systems with Video Language Models
Yuxuan Yan, Shiqi Jiang, Yifan Yang, et al.
USENIX NSDI 2025 -
DreamDistribution: Learning Prompt Distribution for Diverse In-distribution Generation
Brian Nlong Zhao, Yifan Yang, et al.
ICLR 2025 -
Understanding and Improving Training-free Loss-based Diffusion Guidance
Yifei Shen, Xinyang Jiang, Yifan Yang, et al.
NeurIPS 2024 -
Online Video Super-resolution with Convolutional Kernel Bypass Grafts
Jun Xiao, Xinyang Jiang, Yifan Yang, et al.
IEEE Transactions on Multimedia, 2023
-