Differentially Private Aggregation of Distributed Time-Series with Transformation and Encryption

SIGMOD '10: Proceedings of the 2010 ACM SIGMOD international conference on Management of data |

Published by Association for Computing Machinery, Inc.

Publication

We propose PASTE, the first differentially private aggregation algorithms for distributed time-series data that offer good practical utility without any trusted server. PASTE addresses two important challenges in participatory data-mining applications where (i) individual users collect temporally correlated time-series data (such as location traces, web history, personal health data), and (ii) an untrusted third-party aggregator wishes to run aggregate queries on the data. To address this, PASTE incorporates two new algorithms. To ensure differential privacy for time-series data despite the presence of temporal correlation, PASTE uses the Fourier Perturbation Algorithm (FPAk). Standard differential privacy techniques perform poorly for time-series data. To answer n queries, such techniques can result in a noise of Θ(n) to each query answer, making the answers practically useless if n is large. Our FPAk algorithm perturbs the Discrete Fourier Transform of the query answers. For answering n queries, FPAk improves the expected error from Θ(n) to roughly Θ(k) where k is the number of Fourier coefficients that can (approximately) reconstruct all the n query answers. Our experiments show that k ≪ n for many real-life data-sets resulting in a huge error-improvement for FPAk. To deal with the absence of a trusted central server, PASTE uses the Distributed Laplace Perturbation Algorithm (DLPA) that adds noise in a distributed way in order to guarantee differential privacy. To the best of our knowledge, DLPA is the first distributed differentially private algorithm that can scale with a large number of users: DLPA outperforms the only other distributed solution for differential privacy proposed so far, by reducing the computational load per user from O(U) to O(1) where U is the number of users.