vAttention
vAttention is a memory manager for KV-cache in LLM serving systems. It decouples the allocation of virtual memory and physical memory using the CUDA virtual memory APIs. This approach enables allocating physical memory on demand while retaining the contiguity of KV-cache in virtual memory. This way, vAttention provides support for dynamic memory allocation to unmodified attention kernels.