关于
Tianyu He is a Senior Researcher at Machine Learning Group, Microsoft Research Asia. His research interests include machine learning, generative learning, and their applications on content understanding and creation. Before joining Microsoft in 2022, he spent 3 years on AI landing at Alibaba DAMO Academy. He authored tens of academic research papers in many well-recognized international conferences like NeurIPS, ICLR, CVPR, ICCV, ECCV, etc.
My long-term goal is to advance intelligence for real world. Currently, I am dedicated to pioneering generative models:
- Interactive Video World Model (2024-):
- Tokenization:
- VidTok: A Versatile and Open-Source Video Tokenizer [Report (opens in new tab)][Code (opens in new tab)][Models (opens in new tab)][Blog][X (opens in new tab)][Liangziwei (opens in new tab)][Talk (opens in new tab)]
- VidTwin: Video VAE with Decoupled Structure and Dynamics [Report (opens in new tab)][Code (opens in new tab)][Website (opens in new tab)]
- Causal & Real-Time:
- MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft [Report (opens in new tab)][Code (opens in new tab)][X (opens in new tab)]
- Fast Autoregressive Video Generation with Diagonal Decoding [Report (opens in new tab)][Website]
- Playing with Transformer at 30+ FPS via Next-Frame Diffusion [Report (opens in new tab)][X (opens in new tab)][Liangziwei (opens in new tab)]
- Interactive:
- Video in-context Learning: Autoregressive Transformers are Zero-Shot Video Imitators [Report (opens in new tab)][Website][Liangziwei (opens in new tab)]
- IGOR: Image-GOal Representations are the Atomic Control Units for Foundation Models in Embodied AI [Report (opens in new tab)][Website][Liangziwei (opens in new tab)]
- UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing [Report (opens in new tab)][Website (opens in new tab)][Code (opens in new tab)][Jiqizhixin (opens in new tab)]
- Consistent:
- Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling [Report (opens in new tab)][Website (opens in new tab)][Linkedin (opens in new tab)]
- Tokenization:
- Interactive 4D World Model (2024-):
- Interactive & Compositional:
- Compositional 3D-aware Video Generation with LLM Director [Report (opens in new tab)][X (opens in new tab)]
- Causal:
- AR4D: Autoregressive 4D Generation from Monocular Videos [Report (opens in new tab)][X (opens in new tab)]
- Representation:
- End-to-End Rate-Distortion Optimized 3D Gaussian Representation [Report (opens in new tab)][Code (opens in new tab)]
- Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View Synthesis? [Report (opens in new tab)][X (opens in new tab)]
- Spatial Audio:
- Sonic4D: Spatial Audio Generation for Immersive 4D Scene Exploration [Report (opens in new tab)][Website (opens in new tab)]
- Interactive & Compositional:
- Talking Avatar Generation (2022-2024):
- Video:
- GAIA: Zero-Shot Talking Avatar Generation [Report (opens in new tab)][X (opens in new tab)][Jiqizhixin (opens in new tab)]
- DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder [Report (opens in new tab)]
- InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation [Report (opens in new tab)][Website (opens in new tab)]
- 3D Face:
- HiFace: High-Fidelity 3D Face Reconstruction by Learning Static and Dynamic Details [Report (opens in new tab)][Website (opens in new tab)][PaperWeekly (opens in new tab)]
- Personalization:
- Memories are One-to-Many Mapping Alleviators in Talking Face Generation [Report (opens in new tab)][Website (opens in new tab)]
- Video:
I am hiring interns for Generative Models, and World Models! Please email me (tianyuhe@microsoft.com) if you are interested.