About
I am a Principal Researcher at Microsoft Research based in Montreal, Canada. I received my Ph.D. in Computer Science from University College London under the supervision of Dr. Emine Yilmaz. I joined Microsoft in 2006 and Bing—then called Live Search—in 2007 where over the years I shipped several search quality improvements and conducted research with strong focus on both academic and product impact. After 16 years working on the product side of Microsoft, I joined Microsoft Research in 2022. Before moving to Montreal, I have been part of the Microsoft labs in Hyderabad (India), Bellevue (USA), and Cambridge (UK). My research interests are at the intersection of information retrieval, deep learning, and FATE (Fairness, Accountability, Transparency, and Ethics).
Featured Items

Book: An Introduction to Neural Information Retrieval
Neural models have been employed in many Information Retrieval scenarios, including ad-hoc retrieval, recommender systems, multi-media search, and even conversational systems that generate answers in response to natural language questions. An Introduction to Neural Information Retrieval provides a tutorial introduction to neural methods for ranking documents in response to a query, an important IR task. The monograph provides a complete picture of neural information retrieval techniques that culminate in supervised neural learning to rank models including deep neural network architectures that are trained end-to-end for ranking tasks. In reaching this point, the authors cover all the important topics, including the learning to rank framework and an overview of deep neural networks. This monograph provides an accessible, yet comprehensive, overview of the state-of-the-art of Neural Information Retrieval.

PhD Thesis: Neural Methods for Effective, Efficient, and Exposure-Aware Information Retrieval
Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents---or short passages---in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms---such as a person's name or a product model number---not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections---such as the document index of a commercial Web search engine---containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks.

Blog: Adversarial and reinforcement learning-based approaches to information retrieval
Traditionally, machine learning based approaches to information retrieval have taken the form of supervised learning-to-rank models. Recent advances in other machine learning approaches—such as adversarial learning and reinforcement learning—should find interesting new applications in future retrieval systems. At Microsoft AI…

Talk: Neural Information Retrieval: In search of meaningful progress (CIIR Talk Series, University of Massachusetts Amherst)
The emergence of deep learning based methods for search poses several challenges and opportunities not just for modeling, but also for benchmarking and measuring progress in the field. Some of these challenges are new, while others have evolved from existing challenges in IR benchmarking exacerbated by the scale at which deep learning models operate. Evaluation efforts such as the TREC Deep Learning track and the MS MARCO public leaderboard are intended to encourage research and track our progress, addressing big questions in our field. The goal is not simply to identify which run is “best” but to move the field forward by developing new robust techniques, that work in many different settings, and are adopted in research and practice. This entails a wider conversation in the IR community about what constitutes meaningful progress, how benchmark design can encourage or discourage certain outcomes, and about the validity of our findings. In this talk, I will present a brief overview of what we have learned from our work on MS MARCO and the TREC Deep Learning track—and reflect on the state of the field and the road ahead.