Sarathi-Serve
Sarathi-Serve (a research prototype) is a high throughput and low-latency LLM serving framework. This repository contains a benchmark suite for evaluating LLM performance from a systems point of view. It contains various workloads and scheduling policies that together can be used to understand or tune the performance of an LLM under different scenarios. This repository is to be used in conjunction with our LLM performance simulator repository (https://github.com/microsoft/vidur-llm-simulator).