vAttention
vAttention is a memory manager for KV-cache in LLM serving systems. It decouples the allocation of virtual memory and physical memory using the CUDA virtual memory APIs. This approach enables allocating physical memory on demand…
MGit: A Model Versioning and Management System
Networking Infrastructure Group
The Networking Infrastructure Group at Microsoft Research Asia engages in fundamental research on all aspects of computer networking and infrastructure.
Research Focus: Week of July 15, 2024
Advancing time series analysis with multi-granularity guided diffusion model; An algorithm-system co-design for fast, scalable MoE inference; What makes a search metric successful in large-scale settings; learning to solve PDEs without simulated data.
Unified Database: Laying the foundation for large language model vertical applications
Unified databases offer better knowledge transfer between multimodal data types. They provide substantial corpus support for large language models and are poised to drive innovation in underlying hardware, laying the foundation for data-enhanced AI.
OSDI 2024
Microsoft is proud to sponsor the 18th USENIX Symposium on Operating Systems Design and Implementation (opens in new tab). OSDI brings together professionals from academic and industrial backgrounds in what has become a premier forum…