Incremental, Iterative, and Interactive Computation using Differential Dataflow


May 23, 2013


This talk will cover a new computational framework, differential dataflow, that generalizes standard incremental dataflow for far greater re-use of previous results when collections change. Informally, differential dataflow distinguishes between the multiple reasons a collection might change, including both loop feedback and new input data, allowing a system to re-use the most appropriate results from previously performed work when an incremental update arrives. Our implementation of differential dataflow efficiently executes queries with multiple (possibly nested) loops, while simultaneously responding with low latency to incremental changes to the inputs. We show how differential dataflow enables orders of magnitude speedups for a variety of workloads on real data, and enables new analyses previously not possible in an interactive setting.

This is joint work with Derek G. Murray, Rebecca Isaacs, and Michael Isard.


Frank McSherry

Frank McSherry is a researcher at the Microsoft Research Silicon Valley Lab, where he works on issues related to large-scale data analysis. His recent focus has been on data privacy, where he was part of the definition of differential privacy, and he designed and built the Privacy Integrated Queries data analysis platform.