Serving Models, Fast and Slow:Optimizing Heterogeneous LLM Inferencing Workloads at Scale
Kunal Jain, A. Parayil, Ankur Mallick, Rujia Wang, Renee St. Amant, Chetan Bansal, Victor Ruehle, Saravan Rajmohan, Shashwat Jaiswal, Yogesh Simmhan, Anoop Kulkarni, Steve Kofsky
ACM Sigmetrics 2026 | June 2026