À propos
I am a Research Engineer at the Azure Research – Systems group, where I work on improving the efficiency of Microsoft’s AI infrastructure.
My current work sits in the intersection of computer systems and generative AI with the following emphasis:
- Reliable and efficient agentic workflow serving: Sherlock (opens in new tab), Murakkab (opens in new tab)
- Rethinking resource management granularity for generative model serving at scale: OpScale (opens in new tab)
- Efficient multimodal model serving: multimodal input (ModServe, SoCC25 (opens in new tab)) and multimodal generation workflows (StreamWise)
- Long-context LLM serving at multi-million token scale: Medha (opens in new tab)
- Power-aware LLM serving: μ-Serve (ATC24) (opens in new tab), TAPAS (ASPLOS25) (opens in new tab)
I obtained my PhD in Computer Science from UIUC with a thesis on cloud systems management with efficient and robust online learning. My PhD research lies in the intersection of systems and machine learning. Examples include:
- Energy efficient, SLO-aware LLM serving: ATC24 (opens in new tab), AIOps24 (opens in new tab), SoCC24 (opens in new tab)
- Robust, at-scale ML model deployment in cloud systems: ATC23 (opens in new tab), NeurIPS23 (opens in new tab), MLSys24 (opens in new tab)
- Multi-tenant serverless computing resource management: SoCC22 (opens in new tab), NeurIPS22 (opens in new tab), EuroMLSys22 (opens in new tab)
- SLO-oriented resource management for cloud microservices: OSDI20 (opens in new tab)