I think hard about abstractions that help programmers easily express complex problems yet are sufficiently constrained such that we can build runtimes to efficiently implement those abstractions. Broadly speaking, I build abstractions and runtimes for (i) probabilistic programming and approximate computing and (ii) parallel programming.
- Probabilistic Programming in General Purpose Programs: Applications that sense and reason about the complexity of the world use estimates. Mobile phone applications estimate location with GPS sensors, search estimates information needs from search terms, machine learning estimates hidden parameters from data, and approximate hardware estimates precise hardware to improve energy efficiency. The difference between an estimate and its true value is uncertainty. This work builds novel tools and runtimes to help developers reason about uncertain data. Uncertain is a generic type which lifts operations on T to distributions over T. Its abstractions are designed to help novice developers incorporate, compute, and reason about uncertain data in their programs without requiring a PhD degree in statistics. Likewise, probabilistic assertions let programmers specify quality constraints in programs as probabilistic correctness properties and our runtime and compiler efficiently verifies whether those probabilistic assertions are likely to hold.
- Data Parallel Finite State Machines and Dynamic Programming: Hardware is parallel. While desktops, tablets, and even phones have vector instructions, multiple cores, and GPUs, many important algorithms are unable to exploit all of this parallelism. In this work, we introduce convergence, a new method for breaking data dependencies in loops and demonstrate how it makes FSMs and a class of dynamic programming algorithms data parallel. We frame both FSMs and dynamic programming algorithms as the repeated application of matrix-vector and matrix-matrix products in an appropriate semring. Convergence is an empirical property of repeated matrix-matrix products wherein the rank of the resulting matrix after a small sequence of matrix-matrix products tends to be small. Our results demonstrate multi-factor speedups on data parallel hardware.