About
I received the B.S. degree in computer science from ShanghaiTech University and Ph.D degree in the electronic and computer engineering from the Hong Kong University of Science and Technology.
During my Ph.D. studies, I focused on the intersection of signal processing and machine learning. I aimed to demystify deep learning by applying signal processing tools such as sparse coding. Additionally, I was the first to introduce graph neural networks (GNNs) to communication and networking (V1 (opens in new tab) V2 (opens in new tab)), providing comprehensive theoretical analysis and practical guidelines. I was selected as one of the top 2% of scientists worldwide in the field of Networking & Telecommunications in 2025 (Rank: 2819).
After graduation, I joined Microsoft. With large language models (LLMs) becoming a key focus for productivity, I shifted my research toward LLMs and large multimodal models (LMMs) to align with industry interests.
First, I concentrated on exploring the inner workings of these models using signal processing tools, with the goal of enhancing both their trustworthiness and performance. We were among the first to:
1. Analyze the emergence of reasoning and planning capabilities within LLMs and the gap between supervised fine-tuning (SFT) and reinforcement learning (RL), applying these insights to real-world agents.
- NeurIPS’24 | ALPINE: Unveiling The Planning Capability of Autoregressive Learning in Language Models (opens in new tab)
- Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective (opens in new tab)
2. Investigate LMMs using sparse coding tools, applying them to reduce hallucinations.
3. Provide theoretical guidelines for Mixture of Experts (MoE) structures, applying them to vision foundation models.
Second, I focused on the training systems for LLMs and LMMs:
- BlockOptimizers | Full parameter finetuning 8B models on RTX3090 and 70B models on 4 A100s (opens in new tab)
- LMMs-Engine | High-performance any-to-any modality model training framework. (opens in new tab)
Together with my amazing colleagues, we applied these techniques to fields such as embodied AI (Habi (opens in new tab) Diffusion Veteran (opens in new tab)) and AI for Science (Omni-DNA (opens in new tab) MIMSID (opens in new tab) MuDM (opens in new tab) GraphormerV2 (opens in new tab)).