Dataset Source Code
vAttention
vAttention is a memory manager for KV-cache in LLM serving systems. It decouples the allocation of virtual memory and physical memory using the CUDA virtual memory APIs. This approach enables allocating physical memory on demand…
Publication
MGit: A Model Versioning and Management System
Group
Networking Infrastructure Group
The Networking Infrastructure Group at Microsoft Research Asia engages in fundamental research on all aspects of computer networking and infrastructure.
Microsoft Research Blog
Research Focus: Week of July 15, 2024
Advancing time series analysis with multi-granularity guided diffusion model; An algorithm-system co-design for fast, scalable MoE inference; What makes a search metric successful in large-scale settings; learning to solve PDEs without simulated data.