Dhruv Deshmukh

Researcher

About

Researcher working on efficient LLM inference, with a focus on GPU kernel optimization and performance. Previously a Research Fellow at Microsoft Research India for 2 years, where I worked on Kascade (opens in new tab), a sparse attention technique for long-context LLM inference.

I completed my B.Tech in Computer Science and Engineering from IIT Bhilai in 2023, where my final year work focused on efficient training of distributed graph neural networks (opens in new tab). I’m interested in systems for ML, performance optimization, and building efficient large-scale models.