Selectivity Estimation for Range Predicates using Lightweight Models

45th International Conference on Very Large Data Bases (VLDB 2019) |

Publication

Query optimizers depend on selectivity estimates of
query predicates to produce a good execution plan.
When a query contains multiple predicates, today’s
optimizers use a variety of assumptions, such as
independence between predicates, to estimate selectivity.
While such techniques have the benefit of fast estimation
and small memory footprint, they often incur large
selectivity estimation errors. In this work, we reconsider
selectivity estimation as a regression problem. We explore
application of neural networks and tree-based ensembles
to the important problem of selectivity estimation of
multi-dimensional range predicates. While a straightforward
solution does not outperform baseline, we propose two
simple yet effective design choices, i.e., regression label
transformation and feature engineering, motivated by the
selectivity estimation context. Through extensive empirical
evaluation across a variety of datasets, we show that the
proposed models deliver both highly accurate estimates
as well as fast estimation.