High-Performance Incremental Processing of Complex Analytical Queries


March 15, 2016


Milos Nikokic


EPFL Switzerland


Many of today’s popular computing applications require real-time analytics over large and dynamic datasets, from social web applications to online data warehousing, network monitoring, and algorithmic trading. These applications have long-lived analysis queries that require low latency processing over rapidly changing datasets.

In this talk, I will present techniques for efficient incremental processing of complex analytical queries, ranging from classical SQL queries to linear algebra programs. Our system, called DBToaster, compiles declarative database queries into high-performance stream processing engines that keep query results (views) fresh at very high update rates. DBToaster uses a recursive query compilation algorithm that materializes a supporting set of higher-order delta views to achieve a substantially lower view maintenance cost. Our implementation supports batched processing in local and distributed environments and can deliver up to 5 orders of magnitude better performance than existing DBMS and stream engines. The LINVIEW system focuses on the incremental computation of iterative linear algebra programs that consist of the standard matrix operations. LINVIEW uses matrix factorization techniques to make the incremental computation of standard machine learning algorithms, like linear regression, practical and usually substantially cheaper than re-evaluation. LINVIEW generates parallel incremental programs that outperform re-evaluation techniques by more than an order of magnitude.