- Self-Tuning Technology for Database Systems
- Multi-Tenant Database Systems
- Enterprise Data Analytics
- Text Analytics, Structured Data and Search
I started the AutoAdmin project in 1996 soon after joining MSR. The goal of this project is to make databases self-tuning and self-administering by exploiting knowledge of the workload. Vivek Narasayya was my primary collaborator in early years and subsequently we were joined by other colleagues in this effort. Our primary focus was in automated physical database design as well as automated statistics management in relational systems. The Index Tuning Wizard in Microsoft SQL Server 7.0 and SQL Server 2000 are based on the technology that we developed as part of this project and represented the first workload-driven commercial physical design tools on relational systems to recommend indexes and indexes + materialized views respectively. The scope of the automated physical design technology has since been expanded and made available in the Database Tuning Advisor feature of the SQL Server 2005 and subsequent releases. The AutoAdmin project page has a detailed description of the project and the publications. Recently, I have gotten interested in the related problem of resource management for Multi-Tenant database systems.
I initiated the Data Cleaning project in 2000 with the goal of developing tools and server infrastructure to support data preparation, an essential step before effective data analysis. Venkatesh Ganti was our leading reseracher in this project in the early years. Our work led to fuzzy matching and fuzzy de-duplication transforms in the SQL Server 2005 product (and subsequent releases) in the SQL Server Integration Services component. In recent years, we have incorporated our Data Cleaning technology in Bing.
Text documents as well as structured relational data are sources of our information. Understanding the synergy between these two sources of information has been a longstanding interest of mine. I started looking at this problem in mid-nineties(SIGMOD 1995) when we studied the problem of “join” between Relational tables and Text repositories. Later, we investigated the problem of keyword search over structured databases (IEEE ICDE 2002) and the problem of auto-ranking of answers in database queries (CIDR 2003, VLDB 2004, CIDR 2005). More recently, we have been looking at the problem of entity search (WWW 2008). Ideas from this project have been incorporated in Bing.
Last but not the least, I am interested in the problem of supporting business intelligence and decision support queries more effectively on data platforms. In the past, I have worked on optimization of complex SQL queries, e.g., optimization of queries with group-by (VLDB 2004), user-defined predicates (VLDB 2006), exploiting factorization for index unions/intersection plans (SIGMOD 2003), and data mining predicates (IEEE ICDE 2002). One of the directions I have pursued is that of revisiting the fundamental assumptions in query optimization (SIGMOD 2005, SIGMOD 2009). Currently, I am exploring techniques and tools for “Big Data” enterprise analytic platforms.
(in collaboration with project members)
SQL Server Index Tuning Wizard and Database Tuning Advisor (AutoAdmin project)
Fuzzy Lookup and Fuzzy Grouping Transforms in SQL Server Integration Services (Data Cleaning project)
Query Services and Catalog Data Quality for Bing Shopping (Data Cleaning and Entity Search projects)