I am a researcher at Microsoft Research India working in the areas of Machine Learning, Natural Language Systems and Applications, as well as Technology for Emerging Markets. My research interests lie broadly in the area of Speech and Language Technology  especially in the use of linguistic models for building technology that offers a more natural Human-Computer as well as Computer-Mediated interactions.

I am currently working on Project Mélange  where we try to understand, process and generate Code-mixed language data for both text and speech. Code-mixing or use of more than one languages in a single conversation or utterance is a phenomenon that is observed in all multilingual societies. Though Code-mixing has been studied in the past as a feature of conversational speech, the rapid rise of social-media and other online forums, has made it a common phenomenon for text as well. Conversational speech applications, like personal assistants as well as speech-to-speech translations, make it imperative that we know how to model this in speech as well.

Recently, I have become interested in how social and pragmatic functions affect language use, in code-mixed as well as monolingual conversations, and how to build effective computational models of sociolinguistics and pragmatics that can lead to more aware Artificial Intelligence.

I am also very passionate about NLP and Speech technology for Indian Languages. I believe that local language technology especially with speech interfaces, can help millions of people gain entry into a world that is till now almost inaccessible to them. I have served, and continue to serve, on several government and other committees that work on Indian Language Technologies as well as Linguistic Resources and Standards for NLP/Speech.


Project Mélange

Established: January 1, 2012

Project Mélange: Understanding MixEd LANguaGE and Code-mixing The goal of Project Mélange is to understand the uses of and build tools around code-mixing. Multilingual communities exhibit code-mixing, that is, mixing of two or more socially stable languages in a single conversation, sometimes even in a single utterance. This phenomenon has been widely studied by linguists and interaction scientists in the spoken language of such communities. However, with the prevalence of social media and other informal…












Understanding Language Preference for Expression of Opinion and Sentiment: What do Hindi-English Speakers do on Twitter?

December 2016

Linguistic research on multilingual societies has indicated that there is usually a preferred language for expression of emotion and sentiment. Paucity of data has limited such studies to participant interviews and speech transcriptions from small groups of speakers. In this paper, we report a study on 430,000 unique tweets from Indian users, specifically Hindi-English bilinguals,…

    Click the icon to access this download

  • Website