Portrait of Sunayana Sitaram

Sunayana Sitaram

Senior Researcher


Hello, and thanks for stopping by! I’m a Senior Researcher at Microsoft Research India where I work on Natural Language Processing. My research interests are broadly in democratizing AI and currently, the focus of my work is on Massively Multilingual Language Models and Responsible AI. I’m excited about translating long term and cutting-edge research in data, modeling and evaluation into large-scale real-world product improvements. I thrive in leading and working with diverse, interdisciplinary teams and regularly work with social scientists, linguists, designers and product groups.

I am the Director of the MSR India Research Fellow program. The RF program is a unique opportunity for students who are considering pursing a career in research. The RF program exposes bright young minds to cutting-edge research and provides mentorship from some of the top researchers in the world. My goal is to provide the best possible experience for RFs during their stint at MSRI to help them realize their potential.

I am active in the NLP research community and have organized several special sessions (Interspeech 2016-2018, IWSDS 2023), workshops (code-switching 2020, 2021, evaluation – 2022) and given many invited talks in academia and industry. I will be co-chair of the Industry track at ACL 2022 and a Senior Area Chair of the Multilingualism track at ACL 2022.

For up-to-date information about publications, please take a look at my Google Scholar page.

I have been fortunate to work with many wonderful interns and Research Fellows who inspire me and keep me on my toes!

In reverse chronological order:

Kabir Ahuja (current RF), Krithika Ramesh (current intern), Shrey Pandit (current intern), Abhinav Rao (RF @ Microsoft Turing), Aniket Vashishtha (RF @ MSRI), Shaily Bhat (RF @ Google Research), Simran Khanuja (RF @ Google Research -> PhD @ Carnegie Mellon University), Anirudh Srinivasan (MS @ UT Austin), Sanket Shah (Salesken.ai), Brij Mohan Lal Srivastava (PhD @ INRIA – > Nijta (startup)), Sunit Sivasankaran (PhD @ INRIA -> Microsoft), Sai Krishna Rallabandi (PhD @ CMU -> Fidelity).


Please consider submitting a paper to the Languages special issue on “Interdisciplinary Approaches to Data Collection, Annotation and Computational Processing of Code-Switched Languages around the World”. You can find more details here.

I was invited to be a speaker at the VAIBHAV summit organized by the Govt. of India for the AI/ML Speech Understanding panel.

I was part of the Students Meet Experts session at Interspeech 2020 organized by the ISCA-SAC,

Our survey paper on code-switching, that covers more than 250 papers is now available on arxiv.

Code and Datasets

Our code for evaluation of multilingual systems, LITMUS Predictor is now open source. Please also check out the LITMUS Predictor demo here.

Our benchmark for evaluating code-switched NLP called GLUECoS is now open source, along with scripts for pre-processing 11 code-switched datasets! Get the code here.

We built the first code-switched NLI dataset using Bollywood movie data as premises. Check out the paper and data here. We also released a tool for Language Identification from text here.

Code-switched data for the Language Identification shared task organized as part of the First Workshop on Speech Technologies for Code-switching for Multilingual Communities is now available for research use.

I also organized a shared task on ASR for low resource languages in a special session at Interspeech 2018, and we released data from three low-resource Indian languages as part of this challenge which is now available for research use.

Prior to coming to MSR India

I finished my PhD in 2015 at the Language Technologies Institute, Carnegie Mellon University. I worked on Text-to-Speech systems with my advisor Alan W Black, and my thesis was on pronunciation modeling for low-resource languages. From 2010-2012, I was a Masters student at CMU with Jack Mostow, and I worked on children’s oral reading prosody. I also interned with Microsoft Research India in Summer 2012 and we built a low-vocabulary ASR system for farmers in rural central India.