Accurate Query Optimization by Sub-plan Memoization
Query optimizers use approximate techniques such as histograms or sampling for result size and distinct value estimation, even though these techniques may incur high estimation errors, leading the optimizer to choose sub-optimal query execution plans. In this report, we propose a novel approach to query optimization that provides the query optimizer with exact values for the result size of operators and operator trees, which we call sub-plans, and for the number of distinct values in the output of these sub-plans. In our approach, the query optimizer optimizes the query and records all the sub-plans for which result size or distinct value estimates are required in a data structure that we call the sub-plan memo. After query optimization is completed, the sub-plans in the sub-plan memo are executed and their actual result sizes and the number of distinct values in their outputs are recorded in the memo. The optimizer then re-optimizes the query using the more accurate result size and distinct value information in the sub-plan memo, which results in potentially choosing a better query execution plan. The process is repeated if re-optimization encounters sub-plans that are not in the sub-plan memo. This technique can result in potentially increasing the optimization time sharply. However, it is very effective in choosing the optimal query execution plan. Therefore, this approach is primarily suitable for frequently executed queries embedded in application programs. We present an experimental evaluation of our proposed approach based on a prototype implementation in Microsoft SQL Server 2000.