Theme: Machine Learning and Natural Language Systems

Return to Microsoft Research Lab – India

Machine Learning and AI | India

The ML and AI group at Microsoft Research India tackles some of the coolest, hardest, most fascinating, and impactful research problems in AI, ranging from theory to advances in large-scale AI models. Our research has won best paper awards at premier conferences and has benefitted hundreds of millions of people world-wide. See our publications for examples of our work.

The group offers a warm and welcoming environment where diverse people and projects thrive. We work hard to provide support mechanisms which allow people the flexibility to explore different paths to success, including:

Choosing to strive for excellence in individual projects vs collaborating in teams to achieve much more than any individual could.
Pursuing fundamental research in core areas vs working on highly interdisciplinary projects at the intersection of artificial intelligence, machine learning, natural language processing, information retrieval, systems, theory, and related areas.
Studying the theoretical foundations of problems vs taking an empirical approach and tackling applications end-to-end.
Choosing the ideal mix of academic, social and product impact that is appropriate for each person.

Researchers, engineers, and data scientists pursuing all these approaches have found success in our group at MSR India. In addition to exploring the frontiers of new research areas in a bottom-up fashion, we also provide top-down support for mature projects by facilitating research collaborations and providing strong engineering and data-science support.

Here’s a sample of our projects that are extending state-of-the-art for machine learning.

Extreme Classification: This project tackles the largest classification problems in the world involving millions to billions of categories. It started out as a bottom-up exploration by an individual researcher in a core area of machine learning. After a decade, the project is still going strong and has evolved into an interdisciplinary project with key contributions from researchers in diverse areas of computer science and strong engineering support from the lab. Along the way, the project has started a new area of machine learning research which is thriving in both academia and industry. The extreme classification group has published highly impactful papers in leading conferences including ICML, KDD, NeurIPS, WSDM & WWW where the quality of their research has been recognized by best paper awards. The group also maintains the Extreme Classification Repository which has become a vital resource for carrying out open and reproducible academic research. At the same time, the group has won multiple awards for their impact and engineering excellence. As a result of all these contributions, as well as the group’s open research policy towards publishing and the release of source code, extreme classifiers are now widely deployed in the tech industry where they are making billions of predictions a day, are improving the productivity of billions of people, and have significantly benefitted many businesses worldwide.

Aligning AI models for large-scale decision-making: Some of the hardest problems in the world involve decision-making in large action spaces, such as in healthcare, robotics, information dissemination, and recommendations. Here we envision a new paradigm of dealing with missing data in decision-making scenarios by using large language models (LLMs) like GPT-4 for feedback and data augmentation, or directly as teachers for solving the task. To enable this paradigm, we are developing scalable RL-based algorithms for optimizing a model under non-differentiable rewards (based on LLMs or other task-specific constraints). Empirically, we are finding that LLMs bring complementary capabilities leading to state-of-the-art accuracy for diverse tasks such as learning a causal world model and retrieving relevant documents from a corpus (opens in new tab)..

Making LLMs more inclusive to languages of the world: Large Language Models are being used in various applications today and have the potential to revolutionize the future of work – however, current models perform worse for non-English languages compared to English (opens in new tab), particularly for languages written in scripts other than the Latin script, and for low-resource languages. Our research focuses on scaling up multilingual evaluation (opens in new tab) so that we can determine the strengths and weaknesses of models in non-English languages. We also work on parameter efficient fine-tuning methods for improving performance of LLMs on languages other than English.

AI4Code: One of the thriving AI research areas, and arguably the biggest beneficiary of generative AI is software engineering. At MSR India, we are going after some of the high-impact research questions around how to make robust generative AI-based systems to accelerate software engineering tasks, such as rewriting entire repositories to improve quality and fixing compilation errors and repository-wide package migration (opens in new tab), beyond code completion.

Causal Machine Learning: To build reasoning systems of the future, a key ingredient is the ability to learn causal relationships from data. Our group has pioneered research at the intersection of causality and machine learning: using machine learning to increase robustness of causal effect inference methods, and developing causality-based algorithms for improving generalization of predictive machine learning models. To accelerate adoption of causality, we developed the DoWhy (opens in new tab) library as a research prototype. Today the library is widely deployed in industry and academia with over 2M installations and attracts contributions from leading universities of the world. We are also a founding member of PyWhy (opens in new tab), an open ecosystem for causal machine learning that is driving research towards end-to-end systems for causal learning.

We are actively looking for exceptional researchers, engineers and data scientists in all areas of AI and ML. Please visit MSR India’s career opportunities page to see what positions are currently open.