PCPs and the Hardness of Generating Private Synthetic Data


November 19, 2010


Jonathan Ullman


We study the computational complexity of differentially private data release for simple, natural queries. Without computational constraints, the work of Blum, Ligett, and Roth (STOC ’08) showed that it is possible to produce differentially private “synthetic databases” that approximately preserve rich classes of statistics about the original database. Dwork et. al. (STOC ’09) showed that the inefficiency of their mechanism is inherent for certain families of queries, but left open whether or not simple statistics can be approximately preserved efficiently and with differential privacy.

We extend the hardness results of Dwork et. al. to simple, natural families of queries. In particular, assuming the existence of one-way functions, we show that there is no polynomial-time, differentially private algorithm A that takes a database D in (0,1n)d and outputs a synthetic database D’ all of whose two-way marginals are approximately equal to those of D. (A two-way marginal is the fraction of database rows x ? 0,1d with a given pair of values in a given pair of columns.) Our proof combines a construction of hard-to-sanitize databases based on digital signatures from Dwork et. al. with encodings based on the PCP Theorem.

Joint work with Salil Vadhan.


Jonathan Ullman

Jon Ullman is a second-year PhD student in the School of Engineering and Applied Sciences at Harvard, advised by Salil Vadhan. Jon is interested in the theory of differentially private data analysis and more broadly in Cryptography and Complexity Theory.