Portrait of Surajit Chaudhuri

Surajit Chaudhuri

Deputy Managing Director/ Distinguished Scientist


I am a distinguished scientist and the managing director of XCG (part of Microsoft Research). My technical work is done in collaboration with members of the Data Management, Exploration and Mining group in XCG. Prior to joining Microsoft Research in Jan 1996, I worked at HP Labs, Palo Alto from 1992-1995. I received my Ph.D. from Stanford University and my B.Tech. from IIT, Kharagpur.


Query Result Navigation

Established: January 9, 2014

Exploratory queries on a database often returns too few or too many results (e.g., a home search query on a database of available homes). In such cases, the user faces the challenges of (i) navigating through too many results and/or…

Synonym Mining

Established: January 7, 2014

The same entity is often referred to in a variety of ways. For example, the camera Canon 600d is also referred to as "canon rebel t3i", the celebrity Jennifer Lopez is also referred to as "jlo" and Seattle Tacoma International…

SQLVM: Performance Isolation in Multi-Tenant Relational Database-as-a-Service

Established: February 14, 2013

Multi-tenancy and resource sharing are essential to make a Database-as-a-Service (DaaS). However, resource sharing usually results in the performance of one tenant’s workload to be affected by other co-located tenants. In the SQLVM project, our approach to performance isolation in…

Entity Search and Query Portals

Established: March 20, 2011

The goal of entity search is to return entities (e.g., people, products, locations) relevant to a keyword query. The goal of Query Portals is to go one step further and return not only the names of relevant entities but a…























Research Interests

  • Self-Tuning Technology for Database Systems
  • Multi-Tenant Database Systems
  • Enterprise Data Analytics
  • Text Analytics, Structured Data and Search

Past Projects

I started the AutoAdmin project in 1996 soon after joining MSR. The goal of this project is to make databases self-tuning and self-administering by exploiting knowledge of the workload. Vivek Narasayya was my primary collaborator in early years and subsequently we were joined by other colleagues in this effort. Our primary focus was in automated physical database design as well as automated statistics management in relational systems. The Index Tuning Wizard in Microsoft SQL Server 7.0 and SQL Server 2000 are based on the technology that we developed as part of this project and represented the first workload-driven commercial physical design tools on relational systems to recommend indexes and indexes + materialized views respectively. The scope of the automated physical design technology has since been expanded and made available in the Database Tuning Advisor feature of the SQL Server 2005 and subsequent releases. The AutoAdmin  project page has a detailed description of the project and the publications.  Recently, I have gotten interested in the related problem of resource management for Multi-Tenant database systems.

I initiated the Data Cleaning project in 2000 with the goal of developing tools and server infrastructure to support data preparation, an essential step before effective data analysis. Venkatesh Ganti was our leading reseracher in this project in the early years. Our work led to fuzzy matching and fuzzy de-duplication transforms in the SQL Server 2005 product (and subsequent releases) in the SQL Server Integration Services component. In recent years, we have incorporated our Data Cleaning technology in Bing.

Text documents as well as structured relational data are sources of our information. Understanding the synergy between these two sources of information has been a longstanding interest of mine. I started looking at this problem in mid-nineties(SIGMOD 1995) when we studied the problem of “join” between Relational tables and Text repositories. Later, we investigated the problem of keyword search over structured databases (IEEE ICDE 2002) and the problem of auto-ranking of answers in database queries (CIDR 2003, VLDB 2004, CIDR 2005). More recently, we have been looking at the problem of entity search (WWW 2008). Ideas from this project have been incorporated in Bing.

Last but not the least, I am interested in the problem of supporting business intelligence and decision support queries more effectively on data platforms. In the past, I have worked on optimization of complex SQL queries, e.g., optimization of queries with group-by (VLDB 2004), user-defined predicates (VLDB 2006), exploiting factorization for index unions/intersection plans (SIGMOD 2003), and data mining predicates (IEEE ICDE 2002). One of the directions I have pursued is that of revisiting the fundamental assumptions in query optimization (SIGMOD 2005, SIGMOD 2009). Currently, I am exploring techniques and tools for “Big Data” enterprise analytic platforms.


  • 2012 ICDE Influential paper Award
  • 2011 ACM SIGMOD Edgar F. Codd Innovations Award
  • 2008 VLDB Best Paper Award (with Nico Bruno)
  • 2007 VLDB 10-Year Best Paper Award (with Vivek Narasayya)
  • 2005 ACM Fellow
  • 2004 ACM SIGMOD Contributions Award
  • 2000 IEEE ICDE Best Paper Award (with Vivek Narasayya)

Selected Professional Activities

  • 2010 ACM Symposium on Cloud Computing (SOCC): Program Co-Chair
  • 2006 ACM Conference on Management of Data (SIGMOD): Program Chair
  • 1999 ACM Conference on Knowledge Discovery and Data Mining (KDD): Program Co-Chair
  • 2011 IEEE Data Engineering Conference: Industrial Track Co-Chair
  • 2003 ACM SIGMOD Conference: Industrial Track Chair
  • 2001 ACM Conference on Knowledge Discovery and Data Mining: Industrial Track Co-Chair
  • 1999 ACM SIGMOD Conference: Industrial Track Co-Chair
  • 1998 IEEE Conference on Data Engineering (ICDE):Industrial Track Chair
  • 2002 IEEE Conference on Data Engineering (ICDE): Chair, OLAP and Data Warehousing Track
  • 2008 VLDB 10-year award committee, Chair
  • 2002 VLDB 10-year award committee, Member
  • ACM Transactions on Database Systems (TODS): Associate Editor,  2001-2007
  • IEEE Transactions on Knowledge and Data Engineering (TKDE): Associate Editor, 2001-2005
  • IEEE Data Engineering Bulletin : Associate Editor, 1998-1999

Invited Talks, Tutorials, and Surveys

  • Experiences with Problem #9: Invited Talk, SIGMOD 2011, Athens.
  • A Programming Framework for Data Cleaning, Distinguished Lecture, University of British Columbia, 2009.
  • An Overview of Business Intelligence Technology, CACM 2011. (with Umeshwar Dayal, Vivek Narasayya)
  • Self-Tuning Database Systems: A Decade of Progress. VLDB 2007. (with Vivek Narasayya)
  • Foundations of automated database tuning, Tutorial presented at ACM SIGMOD 2005, VLDB 2006. (with Gerhard Weikum)
  • Self-Managing Technology in Database Management Systems, Tutorial presented at VLDB 2004. (with Benoît Dageville, Guy M. Lohman)
  • Databases and IR: Perspectives of a SQL Guy, NSF Information and Data Management PI Workshop, Seattle, 2003
  • An Overview of Data Warehousing and OLAP technology. Sigmod Record, March 1997 Tutorials Presented at 1996 VLDB, 1997 SIGMOD, 1998 EDBT and 1998 IEEE ICDE Conferences (with Umeshwar Dayal).
  • An Overview of Query Optimization in Relational Systems. Proceedings of 1998 ACM PODS. Invited Tutorial at ACM PODS Conference, 1998

Technology Transfer

(in collaboration with project members)

  • SQL Server Index Tuning Wizard and Database Tuning Advisor (AutoAdmin project)
  • Fuzzy Lookup and Fuzzy Grouping Transforms in SQL Server Integration Services (Data Cleaning project)
  • Query Services and Catalog Data Quality for Bing Shopping (Data Cleaning and Entity Search projects)