Privacy, Accuracy, And Consistency Too: A Holistic Solution To Contingency Table Release

Boaz Barak; Kamalika Chaudhuri; Cynthia Dwork; Satyen Kale; Frank McSherry; Kunal Talwar

Privacy, Accuracy, And Consistency Too: A Holistic Solution To Contingency Table Release

Boaz Barak ,
Kamalika Chaudhuri ,
Cynthia Dwork ,
Satyen Kale ,
Frank McSherry ,
Kunal Talwar

Proceedings of the Twenty-Sixth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems | June 2007

Published by Association for Computing Machinery, Inc.

Publication

Download BibTex

The contingency table is a work horse of official statistics, the format of reported data for the US Census, Bureau of Labor Statistics, and the Internal Revenue Service. In many settings such as these privacy is not only ethically mandated, but frequently legally as well. Consequently there is an extensive and diverse literature dedicated to the problems of statistical disclosure control in contingency table release. However, all current techniques for reporting contingency tables fall short on at least one of privacy, accuracy, and consistency (among multiple released tables). We propose a solution that provides strong guarantees for all three desiderata simultaneously. Our approach can be viewed as a special case of a more general approach for producing synthetic data: Any privacy preserving mechanism for contingency table release begins with raw data and produces a (possibly inconsistent) privacy preserving set of marginals. From these tables alone – and hence without weakening privacy – we will find and output the “nearest” consistent set of marginals. Interestingly, this set is no farther than the tables of the raw data, and consequently the additional error introduced by the imposition of consistency is no more than the error introduced by the privacy mechanism itself. The privacy mechanism of [20] gives the strongest known privacy guarantees, with very little error. Combined with the techniques of the current paper, we therefore obtain excellent privacy, accuracy, and consistency among the tables. Moreover, our techniques are surprisingly efficient. Our techniques apply equally well to the logical cousin of the contingency table, the OLAP cube.

Copyright © 2007 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org. The definitive version of this paper can be found at ACM's Digital Library --http://www.acm.org/dl/.