On-The-Fly Progress Detection In Iterative Stream Queries

Badrish Chandramouli, Jonathan Goldstein, David Maier

International Conference on Very Large Databases (VLDB), Lyon, France |

Multiple researchers have proposed cyclic query plans for evaluating iterative queries over streams or rapidly changing input. The Declarative Networking community uses cyclic plans to evaluate Datalog programs that track reachability and other graph traversals on networks. Cyclic query plans can also evaluate pattern-matching and other queries based on event sequences.

An issue with cyclic queries over dynamic inputs is knowing when the query result has progressed to a certain point in the input, since the number of iterations is data dependent. One option is a “strictly staged” computation, where the query plan quiesces between inputs. This option introduces significant latency, and may also “underload” inter-operator buffers. An alternative is to settle for soft guarantees, such as “eventual consistency”. Such imprecision can make it difficult, for example, to know when to purge state from stateful operators.

We propose a third option in which cyclic queries run continuously, but detect progress “on the fly” by means of a Flying Fixed-Point (FFP) operator. FFP sits on the cyclic loop and circulates speculative predictions on forward progress, which it then validates. FFP is always able to track progress for a class of queries we term strongly convergent. A key advantage of FFP is that it works with existing algebra operators, thereby inheriting their capabilities, such as windowing and dealing with out-of-order input. Also, for stream systems that explicitly model input-event lifetimes, we know exactly which values are in the query result at each point in time.

A key implementation decision is the method for speculating. Using the high-water mark of data events minimizes the number of speculative punctuations. Probing operators on the cyclic loop to determine their external progress circulates many more speculative messages, but tracks actual output progress more closely. We show how a hybrid approach limits predictions while coming close the progress-tracking ability of Probing.