Abstract

In a census, individual respondents give private information to a trusted party (the census bureau), who publishes a sanitized version of the data. There are two fundamentally conflicting requirements: privacy for the respondents and utility of the sanitized data. Note that this framework is inherently noninteractive. Recently, Chawla et al. (TCC’2005) initiated a theoretical study of the census problem and presented an intuitively appealing definition of privacy breach, called isolation, together with a formal specification of what is required from a data sanitization algorithm: access to the sanitized data should not increase an adversary’s ability to isolate any individual. They also showed that if the data are drawn uniformly from a high dimensional hypercube then recursive histogram sanitization can preserve privacy with a high probability. We extend these results in several ways. First, we develop a method for computing a privacy-preserving histogram sanitization of “round” distributions, such as the uniform distribution over a high-dimensional ball or sphere. This problem is quite challenging because, unlike for the hypercube, the natural histogram over such a distribution may have long and thin cells that hurt the proof of privacy. We then develop techniques for randomizing the histogram constructions both for the hypercube and the hypersphere. These permit us to apply known results for approximating various quantities of interest (e.g., cost of the minimum spanning tree, or the cost of an optimal solution to the facility location problem over the data points) from histogram counts – in a privacy-preserving fashion.