Physical Database Design
One of the primary areas of focus in the AutoAdmin project has been the challenge of automating physical database design. Physical database design is the problem of identifying the appropriate physical design structures (e.g. indexes) for a given database and workload. Judicious choice of the physical database design is important since a well-chosen physical design can often improve workload performance by orders of magnitude. It is also a very challenging task that typically requires skilled DBAs who can reason about the trade-offs of creating or dropping physical design structures on the workload. Our research on this problem led to the development of the Index Tuning Wizard that shipped as part of Microsoft SQL Server 7.0 in 1998. This work was done in close collaboration with the Microsoft SQL Server product group. The Index Tuning Wizard was the first tool of its kind in any commercial DBMS. Other prominent DBMS vendors followed suit subsequently with similar physical design tuning tools. The architecture of this tool and the underlying algorithms are available in the VLDB 1997 paper “An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server“. This paper won the VLDB 10-Year Best Paper Award in 2007. One of the key aspects of the architecture is an extension to the query optimizer to support a “what-if” interface: given a query and a set of hypothetical physical design structures (e.g. indexes that do not currently exist), return the plan that the optimizer would have picked if that set of physical design structures were actually materialized. This interface, described in a SIGMOD 1998 paper “AutoAdmin ‘What-If’ Index Analysis Utility”, allows physical design tools to consider a large space of alternative physical designs without actually materializing the physical design structures. Similar “what-if” interfaces were subsequently adopted by other commercial DBMSs as well.
In Microsoft SQL Server 2000, the Index Tuning Wizard was extended to also recommend materialized views. The VLDB 2000 paper “Automated Selection of Materialized Views and Indexes for SQL Databases” describes some of the key technical ideas for dealing with the large space of alternatives that arises with the inclusion of materialized views. In the Microsoft SQL Server 2005 release, the tool was changed from being a wizard to a full blown application called the Database Engine Tuning Advisor (DTA). DTA has been used effectively by many real-world customers of Microsoft SQL Server. DTA’s recommendations have resulted in order of magnitude speedups for complex workloads on very large databases, and typically result in comparable performance to hand-tuned physical designs chosen by DBAs. Further information about DTA’s quality of its recommendations and scalability for large and complex real-world workloads can be found in the VLDB 2004 paper “Database Tuning Advisor for Microsoft SQL Server 2005“. We have also published details of DTA’s algorithms, including techniques for handling physical design structures such as partitioning, in the SIGMOD 2004 paper “Integrating Vertical and Horizontal Partitioning into Automated Physical Database Design“. Finally, in many real-world scenarios, physical design tuning needs to capture rich constraints, e.g., no query in the workload should degrade by more than 10% with respect to the current configuration. Our work on constrained physical design tuning won the Best Paper Award at VLDB 2008.
We have also explored other modes of physical design tuning. We studied how to enable a low overhead physical design “alerter” that could help DBAs answer the question: “Is it worthwhile to tune a database now”? Another interesting problem arises when there is no DBA available (e.g. an embedded database or a small business). In such scenarios, a low touch continuous index tuning approach may become important. We have explored solutions for such scenarios as well in the two papers: “To Tune or not to Tune? A Lightweight Physical Design Alerter” in VLDB 2006 and “An Online Approach to Physical Design Tuning” in ICDE 2007.
Our initial work on physical database design was awarded the VLDB 10-year Best Paper Award – in this context, we wrote an overview article on the progress made in physical design tuning as well as other facets of self-tuning database systems.
In addition to physical design structures, DTA also recommends appropriate statistics (e.g. histograms on columns of a base table) for the workload. Statistics are crucial for workload performance since the quality of the plan generated by the query optimizer depends heavily on the available statistics. Our work on techniques for determining appropriate statistics for a workload won the Best Paper Award in ICDE 2001. We have also worked on the problem of exploiting statistics on views during query optimization (SIGMOD 2002, ICDE 2003). Using statistics on views can potentially improve the plan quality significantly, since the view can be defined on the result of a query expression, thereby eliminating errors due to assumptions that the optimizer otherwise would need to make.
Monitoring and Execution Feedback
An important requirement for a self-tuning database system is the ability to efficiently monitor the database and identify potential performance problems. As part of the AutoAdmin project we are investigating some of these issues. We developed a SQL Continuous Monitoring engine (SQLCM) in which DBAs can declaratively specify monitoring tasks that they wish to perform and have the engine execute these tasks efficiently at low overhead and with a small memory footprint.
A functionality that is of great interest to DBAs is estimating the progress of long-running SQL queries which are common in decision support systems. Accurate progress estimates can allow DBAs to make judicious decisions on whether to allow a long-running query that is consuming significant resources to complete or to kill it. We defined the model of progress for a SQL query, developed efficient progress estimators and analytically quantified the inherent hardness of the problem.
The performance of a query can heavily depend on the quality of the execution plan chosen by the query optimizer. Often, many key parameters used by the optimizer cannot be reliably estimated up-front when the plan is chosen. Thus, we have studied the problem of how feedback from query execution can be used to improve the choice of execution plan of the query optimizer. First, we proposed the concept of Self-Tuning Histograms (SIGMOD 1999, SIGMOD 2001) where the statistics (single and multi-dimensional histograms) are built using execution feedback, i.e. cardinalities observed during query execution. There has subsequently been much follow-on research work in the database community on this topic.
Second, we showed that there are many simple cases where existing mechanisms for obtaining execution feedback are limiting, i.e. they do not help improve the execution plan, no matter how many times the query is executed. We therefore studied how to augment execution feedback mechanisms to overcome some of these limitations while keeping the monitoring overheads at query execution time low. For example, using an alternate plan whose cost is not much higher than the current plan, but whose execution will result in obtaining accurate cardinalities for important expressions can yield significant improvements in overall performance.
We have also recently worked on other problems in the context of the AutoAdmin project:
- We developed PHints, a flexible query hinting mechanism, that allows DBAs and application developers fine-grained control in generating an execution plan for a query.
- In many database applications, performance problems can often be traced to poor application design, e.g. dynamic SQL generated by application code. We have also investigated techniques for statically analyzing and dynamically profiling database application code to help identify and potentially fix performance problems.
We are studying novel aspects of the physical database design problem:
- Data compression can be important in data warehousing to improve I/O performance and reduce storage costs. We are studying how the physical database design problem is affected by the availability of compression methods in databases such as null suppression, dictionary compression and run-length encoding (RLE).
- Index fragmentation can be a major contributor to I/O performance degradation in databases. We are studying the problem of automatic index defragmentation to help identify and overcome the impact on index fragmentation on workload I/O performance.
A broad area that we are focusing on is investigating how machine learning techniques can be applied effectively to challenging problems in database monitoring and performance tuning, thereby leveraging historical information from query execution more deeply.