Publication
Towards Safer Heuristics With XPlain
Publication
Reviving Cloud Gaming Sessions
Publication
Input-Dependent Power Usage in GPUs
Dataset Source Code
RetroInfer
Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system. RetroInfer is a novel system that rethinks the KV cache as vector storage within a GPU–CPU co-execution setup to…
Dataset Source Code
AttentionEngine: A Custom Model Optimization Framework
AttentionEngine accelerates transformer attention variants by generating efficient custom kernels, enabling model designers to easily create new variants with our flexible API.
Project
Practical System Verification
Formal verification is a promising approach to eliminate bugs at compile time, before software ships. Unfortunately, verifying the correctness of system software traditionally requires heroic developer effort. In this project, we aim to enable accessible, faster,…