Yifei Shen

a researcher and an engineer

About

I received the B.S. degree in computer science from ShanghaiTech University and Ph.D degree in the electronic and computer engineering from the Hong Kong University of Science and Technology.

During my Ph.D. studies, I focused on the intersection of signal processing and machine learning. I aimed to demystify deep learning by applying signal processing tools such as sparse coding. I was selected as one of the top 2% of scientists worldwide.

After graduation, I joined Microsoft. With large language models (LLMs) becoming a key focus for productivity, I shifted my research toward LLMs and large multimodal models (LMMs) to align with industry interests.

I am currently leading a project of VLM training, and being a contributor of Image generation and editing in MSRA.

My current research interests span unified models, Muon optimizer, SFT versus RL, and algorithmic aspects of neural architectures. I conducted some of the first‑batch explorations in several emerging areas, including

1. Understanding when generation benefits understanding in unified models (on-going);

2. Investigating the emergence of reasoning and planning abilities in LLMs and analyzing the theoretical gap between SFT and RL:

NeurIPS’24 | Can Graph Learning Improve Planning in LLM-based Agents? (opens in new tab)

3. Unrolling classical algorithms into neural architectures—a modern revival now appearing under names such as “looped models” or “test‑time regression”:

ICLR’23 (Oral) | Sparse Mixture-of-Experts are Domain Generalizable Learners (opens in new tab)
Some old works in GNNs (WCGCN (opens in new tab), GF-CF (opens in new tab)) and CNNs (FPN-OAMP (opens in new tab)).

Together with my amazing colleagues, we applied these techniques to fields such as embodied AI (Habi (opens in new tab) Diffusion Veteran (opens in new tab)) and AI for Science (Omni-DNA (opens in new tab) MIMSID (opens in new tab) MuDM (opens in new tab) GraphormerV2 (opens in new tab)).

In my spare time, I contribute to community projects on efficient LLM training on low-resource GPUs:

I also write educational blogs to share technical insights, which have accumulated more than 10k followers and 10k favorites: