Interactive Demonstration of Probabilistic Predicates

Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2018) |

Published by ACM SIGMOD

We will demonstrate a prototype query processing engine that uses probabilistic predicates (PPs) to speed up machine learning inference jobs. In current analytic engines, machine learning functions are modeled as user-defined functions (UDFs) which are both time and resource intensive. These UDFs prevent predicate pushdown; predicates that use the outputs of these UDFs cannot be pushed to before the UDFs. Hence, considerable time and resources are wasted in applying the UDFs on inputs that will be rejected by the subsequent predicate. We uses PPs that are lightweight classifiers applied directly on the raw input and filter data blobs that disagree with the query predicate. By reducing the input to be processed by the UDFs, PPs substantially improve query processing. We will show that PPs are broadly applicable by  constructing PPs for many inference tasks including image recognition, document classification and video analyses. We will also demonstrate query optimization methods that extend PPs to complex query predicates and support different accuracy requirements.