A Provably Correct Sampler for Probabilistic Programs

Foundations of Software Technology and Theoretical Computer Science (FSTTCS) |

Published by Leibniz International Proceedings in Informatics

We consider the problem of inferring the implicit distribution specified by a probabilistic program. A popular inference technique for probabilistic programs called Markov Chain Monte Carlo or MCMC sampling involves running the program repeatedly and generating sample values by perturbing values produced in “previous runs”. This simulates a Markov chain whose stationary distribution is the distribution specified by the probabilistic program. However, it is non-trivial to implement MCMC sampling for probabilistic programs since each variable could be updated at multiple program points. In such cases, it is unclear which values from the “previous run” should be used to generate samples for the “current run”. We present an algorithm to solve this problem for the general case and formally prove that the algorithm is correct. Our algorithm handles variables that are updated multiple times along the same path, updated along different paths in a conditional statement, or repeatedly updated inside loops, We have implemented our algorithm in a tool called SampleX. We empirically demonstrate that SampleX produces the correct result for various benchmarks, whereas existing tools such as R2 and Stan produce incorrect results on several of these benchmarks.