Portrait of Yifan Yang

Yifan Yang

Senior Research SDE

About

I am Yifan Yang, a Senior Research SDE at Microsoft Research Asia (MSRA) Shanghai site, where I joined in 2021. My work focuses on visual content generation, multimodal foundation models, and general-purpose agentic systems, bridging research innovation with product-level deployment. I have published 30+ peer-reviewed papers in top venues including CVPR, ICCV, ECCV, ICLR, NeurIPS, and AAAI, and have been deeply involved in the development of Microsoft’s Phi model family (e.g., Phi-3 and Phi-4). Several of my techniques have been transferred into core Microsoft products, including Office and Azure. Our recent work LLM2CLIP improves cross-modal representation learning by incorporating large language models, has been integrated into the Phi-4-mini pretraining pipeline, and received the AAAI 2026 Outstanding Paper Award.

Google Scholar (opens in new tab)


If you are interested in internship opportunities or research collaborations, feel free to reach out at
đź“§ yifanyang@microsoft.com


  • First-author and Corresponding-author Publications

    (* denotes co-first author, † denotes corresponding author)

    • LLM2CLIP: Powerful Language Model Unlocks Richer Cross-Modality Representation
      Weiquan Huang, Aoqi Wu, Yifan Yang†, et al.
      AAAI 2026 — Outstanding Paper Award  🏆

    • HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models
      Ziqin Zhou, Yifan Yang†, et al.
      AAAI 2026

    • VidGuard-R1: AI-generated Video Detection and Explanation via Reasoning Multimodal Language Models and Reinforcement Learning
      Kyoungjun Park, Yifan Yang†, et al.
      ICLR 2026

    • Zoomer: Adaptive Image Focus Optimization for Black-box Multimodal Large Language Models
      Jiaxu Qian, Chendong Wang, Yifan Yang†, et al.
      Transactions on Machine Learning Research (TMLR), 2025

    • Attentive Mask CLIP
      Yifan Yang, et al.
      ICCV 2023

    • ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation
      Yasheng Sun, Yifan Yang*, et al.
      NeurIPS 2023

    • Directional Self-supervised Learning for Heavy Image Augmentations
      Yalong Bai, Yifan Yang*, et al.
      CVPR 2021

    • VoLUT: Efficient Volumetric Streaming Enhanced by LUT-based Super-resolution
      Chendong Wang, Anlan Zhang, Yifan Yang†, et al.
      MLSys 2025

    • LoRaSC: Expressive and Generalizable Low-rank Adaptation for Large Models via Slow Cascaded Learning
      Siwei Li, Yifan Yang†**, et al.
      EMNLP 2024 (Findings)

    • VIGOR: Reviving Cloud Gaming Sessions
      Zhaoyuan He, Yifan Yang*, et al.*
      ACM CoNEXT 2024

    • Nerve: Real-time Neural Video Recovery and Enhancement on Mobile Devices
      Zhaoyuan He, Yifan Yang*, et al.*
      Proceedings of the ACM on Networking (CoNEXT), 2024


    Technical Reports and Major Preprints (First / Corresponding Author)

    • Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
      Marah Abdin, Sam Ade Jacobs, et al., Yifan Yang
      arXiv Technical Report, 2024

    • Phi-4-mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
      Abdelrahman Abouelenin, Atabak Ashfaq, et al., Yifan Yang
      arXiv Technical Report, 2025

    • ReasonGen-R1: Chain-of-Thought for Autoregressive Image Generation Models through Supervised Fine-tuning and Reinforcement Learning
      Yu Zhang, Yunqi Li, Yifan Yang†, et al.
      arXiv:2505.24875, under review at CVPR 2026

    • Video-in-the-loop: Span-grounded Long Video QA with Interleaved Reasoning
      Chendong Wang, Donglin Bai, Yifan Yang†, et al.
      Under review at ICLR 2026

    • Region-adaptive Sampling for Diffusion Transformers
      Ziming Liu, Yifan Yang†, et al.
      arXiv:2502.10389, under review at CVPR 2026

    • AdaNav: Adaptive Reasoning with Uncertainty for Vision-Language Navigation
      Xin Ding, Jianyu Wei, Yifan Yang, et al.
      arXiv:2509.24387, 2025

    • Diffusion²: Turning 3D Environments into Radio Frequency Heatmaps
      Kyoungjun Park, Yifan Yang†, et al.
      arXiv preprint, 2025


    Collaborative Publications

    • Unified Medical Image Pre-training in Language-Guided Common Semantic Space
      Xiaoxuan He, Yifan Yang, et al.
      ECCV 2024

    • StreamMind: Unlocking Full Frame-rate Streaming Video Dialogue through Event-gated Cognition
      Xin Ding, Hao Wu, Yifan Yang, et al.
      ICCV 2025

    • Efficient and Adaptive Diffusion Model Inference through Lookup Tables on Mobile Devices
      Qipeng Wang, Shiqi Jiang, Yifan Yang, et al.
      IEEE Transactions on Mobile Computing, 2025

    • Online Video Quality Enhancement with Spatial-Temporal Look-up Tables
      Zefan Qu, Xinyang Jiang, Yifan Yang, et al.
      ECCV 2025

    • ProLongVid: A Simple but Strong Baseline for Long-context Video Instruction Tuning
      Rui Wang, Bohao Li, Yifan Yang, et al.
      EMNLP 2025

    • MageBench: Bridging Large Multimodal Models to Agents
      Miaosen Zhang, Qi Dai, Yifan Yang, et al.
      WACV 2025

    • Reducio! Generating 1K Video within 16 Seconds Using Extremely Compressed Motion Latents
      Rui Tian, Qi Dai, Yifan Yang, et al.
      ICCV 2025

    • AdaNav: Adaptive Reasoning with Uncertainty for Vision-Language Navigation
      Xin Ding, Jianyu Wei, Yifan Yang, et al.
      arXiv preprint, 2025

    • Expand Heterogeneous Learning Systems with Selective Multi-Source Knowledge Fusion
      Gengyuan Dai, Hongxu Xu, Yifan Yang, et al.
      AAAI 2026

    • Empowering Agentic Video Analytics Systems with Video Language Models
      Yuxuan Yan, Shiqi Jiang, Yifan Yang, et al.
      USENIX NSDI 2025

    • DreamDistribution: Learning Prompt Distribution for Diverse In-distribution Generation
      Brian Nlong Zhao, Yifan Yang, et al.
      ICLR 2025

    • Understanding and Improving Training-free Loss-based Diffusion Guidance
      Yifei Shen, Xinyang Jiang, Yifan Yang, et al.
      NeurIPS 2024

    • Online Video Super-resolution with Convolutional Kernel Bypass Grafts
      Jun Xiao, Xinyang Jiang, Yifan Yang, et al.
      IEEE Transactions on Multimedia, 2023