Sarathi-Serve
Sarathi-Serve (a research prototype) is a high throughput and low-latency LLM serving framework. This repository contains a benchmark suite for evaluating LLM performance from a systems point of view. It contains various workloads and scheduling…
Research Focus: Week of November 22, 2023
A new deep-learning compiler for dynamic sparsity; Tongue Tap could make tongue gestures viable for VR/AR headsets; Ranking LLM-Generated Loop Invariants for Program Verification; Assessing the limits of zero-shot foundation models in single-cell biology.
VIDUR: LLM Simulator
Vidur is a high-fidelity and extensible LLM inference simulator. It can help you with capacity planning and finding the best deployment configuration for your LLM deployments, test new research ideas like new scheduling algorithms, optimizations…
Large Scale Intelligent Microservices – IEEE Big Data 2020 Paper Presentation
Deploying Machine Learning (ML) algorithms within databases is a challenge due to the varied computational footprints of modern ML algorithms and the myriad of database technologies each with their own restrictive syntax. We introduce an…
Engineering Foundation
Engineering arm of Microsoft Research Asia Microsoft Research Asia (MSRA) is renowned for its ability to develop and launch frontier research and open-source projects at an unprecedented pace. This is made possible in part by…