Publication
A Call for Research on Storage Emissions
Project
MInference: Million-Tokens Prompt Inference for Long-context LLMs
Million-Tokens Prompt Inference for Long-context LLMs MInference 1.0 leverages the dynamic sparse nature of LLMs’ attention, which exhibits some static patterns, to speed up the pre-filling for long-context LLMs. It first determines offline which sparse pattern…
Dataset Source Code
BitBLAS
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.