Welcome!

I work at the intersection of multilingual and multi-cultural NLP, evaluation, and responsible AI, with a focus on ensuring that language technologies work equitably across diverse languages, cultures, and communities. Recently, my work has involved participatory approaches to evaluation, data collection and policy to ensure that AI models and systems reflect the preferences of users from diverse regions and cultures. I have also been focusing on multilingual, multi-cultural synthetic data for post-training language models, and we recently released a large dataset Updesh (opens in new tab) with ~8M data points covering 13 Indian languages.
At Microsoft Research India, I collaborate with and lead interdisciplinary teams that span NLP, machine learning, linguistics, HCI, and social science. I also actively contribute to the research community through conference organization and reviewing. In 2026, I am serving as an Area Chair for ACL Rolling Review (ARR), an Area Chair for COLM, and was Tutorial Chair for IndoML 2025. I’m also on the core Program Committee for the Research Symposium (opens in new tab), a flagship event of the India AI Impact Summit 2026 (opens in new tab).
I collaborate actively with product groups within Microsoft and am building out a team of Applied Scientists in Bangalore. Recently, my research has been shipped in the M365 Copilots, our GenAI-based suite of productivity tools, now supporting 52 languages. I have also contributed to Microsoft’s policy efforts on multilingual product strategy, focusing on inclusivity and language diversity.
🧪 Multilingual Evaluation
I led the creation of MEGA (Multilingual Evaluation of Generative AI, 2023), the first large-scale benchmark to evaluate generative LLMs on 16 NLP datasets across 70 typologically diverse languages. MEGA revealed significant disparities between English and low-resource languages and proposed a modular framework for multilingual evaluation. Building on MEGA, we created MEGAVERSE, an even broader evaluation effort covering 83 languages and 22 datasets, including multimodal tasks. MEGAVERSE benchmarked a wide range of models and performed detailed analysis of language coverage and data contamination. Prior to this, I also spearheaded the creation of the first benchmark for code-mixing, GLUECoS (2020).
👥 Participatory Evaluation at Scale
I believe evaluation should reflect the voices of real users. With this in mind, we introduced Pariksha, a scalable, transparent, and community-driven evaluation exercise for Indian LLMs in collaboration with Karya (opens in new tab). Pariksha brings together 90,000 human and 30,000 automated evaluations across 10 Indian languages and 29 models, and is now perhaps the largest multilingual human evaluation of LLMs ever conducted. This year, we are expanding on this effort with the Samiksha project, and we have created a v1 benchmark consisting of more than 20k data points across 11 languages and done 100k human evaluations on several Indic and Global models. Learnings from our pilot study will be published at CHI 2026 (preprint (opens in new tab)).
🤝 Participatory Responsible AI
As part of my work on Responsible AI, I co-led a participatory effort to address misgendering in LLM applications. We co-designed culturally grounded, multilingual guardrails with native speakers across 42 languages, and showed how these guardrails can reduce harms like misgendering without degrading performance. This work was recognized with an internal Open Data Award at Microsoft and serves as a blueprint for mitigating culturally sensitive harms in AI systems.
📚 Selected Publications
- Building Benchmarks from the Ground Up: Community-Centered Evaluation of LLMs in Healthcare Chatbot Settings (to appear in CHI 2026) (opens in new tab)
- Uncovering inequalities in new knowledge learning by large language models across different languages (PNAS 2025)
- A Multilingual, Culture-First Approach to Addressing Misgendering in LLM Applications (EMNLP 2025) (opens in new tab)
- MEGA: Multilingual Evaluation of Generative AI (EMNLP 2023) (opens in new tab)
- MEGAVERSE: Benchmarking LLMs Across Languages, Modalities, Models and Tasks (NAACL 2024) (opens in new tab)
- Pariksha: (opens in new tab) A Large-Scale Investigation of Human-LLM Evaluator Agreement on Multilingual and Multi-Cultural Data (opens in new tab) (EMNLP 2024) (opens in new tab)
- METAL: Towards Multilingual Meta-Evaluation (NAACL 2024) (opens in new tab)
For full list of publications, please take a look at my Google Scholar page (opens in new tab).
🎤 Recent Talks
-
- Invited talk, pre-summit event on “Community-Driven AI: A Roadmap for India’s Last-Mile”, IIT Delhi, Jan 2026
- Invited talk, York University (online), Jan 2026
- Invited talk, Youth for Tech Futures, The Pranava Institute, Jan 2026
- Keynote, D&I track, AACL-IJCNLP 2025
- Panelist, CHOMPS workshop, AACL-IJCNLP 2025
- Invited talk, IndoML 2025
- Keynote, CLRLC workshop, Neurips 2025
- Invited talk, NIMHANS workshop on AI and mental health, 2025
- Invited talk, Advanced Summer School on NLP (IASNLP-2025)
- Keynote, I Can’t Believe It’s Not Better (ICBINB 2025) @ ICLR 2025
- Keynote, Computational Approaches to Linguistic Code-Switching @ NAACL 2025
- Invited talk, International Network of Safety Institutes, Feb 2025
- Invited talk, Language Technologies for All meeting, UNESCO headquarters, Paris, Feb 2025
🧑🤝🧑Team
I have been fortunate to work with many wonderful interns and Research Fellows who inspire me and keep me on my toes! In reverse chronological order:
Prashant Kodali (current PostDoc), Sourabrata Mukherjee (Current PostDoc), Manan Uppadhyay (current RF), Sanchit Ahuja (RF -> Northeastern PhD), Varun Gumma (RF), Divyanshu Aggrawal (RF), Ishaan Watts (intern -> CMU MS), Ashutosh Sathe (intern -> Google Deepmind), Prachi Jain (PostDoc->Senior Applied Scientist, Microsoft), Kabir Ahuja (PhD at University of Washington), Krithika Ramesh (PhD at Johns Hopkins University), Shrey Pandit (MS at UT Austin), Abhinav Rao (RF @ Microsoft Turing -> MS at CMU), Aniket Vashishtha (RF @ MSRI), Shaily Bhat (RF @ Google Research -> PhD at CMU), Simran Khanuja (RF @ Google Research -> PhD @ Carnegie Mellon University), Anirudh Srinivasan (MS @ UT Austin), Sanket Shah (Salesken.ai), Brij Mohan Lal Srivastava (PhD @ INRIA – > Nijta (startup)), Sunit Sivasankaran (PhD @ INRIA -> Microsoft), Sai Krishna Rallabandi (PhD @ CMU -> Fidelity).
🕰️ Prior to coming to MSR India
I graduated with a PhD in 2015 at the Language Technologies Institute, Carnegie Mellon University. I worked on Text-to-Speech systems with my advisor Alan W Black (opens in new tab), and my thesis was on pronunciation modeling for low-resource languages. From 2010-2012, I was a Masters student at CMU with Jack Mostow (opens in new tab), and I worked on children’s oral reading prosody. I also interned with Microsoft Research India in Summer 2012 and we built a low-vocabulary ASR system for farmers in rural central India.