By Winnie Cui, Senior Research Manager, Microsoft Research Asia
Data scientist, Anthony Tockar, used publicly available location data to show how celebrities can be tracked throughout New York City, while working on his Master’s Degree at Northwestern University. By cross-referencing public news and photos about celebrities hailing cabs in NYC, Tockar found out exactly where celebrities climbed into cabs, where they traveled and even how much they paid!
As this example shows, location-based services, pulling an individual’s location data from GPS, IP addresses and Wi-Fi network mapping, can be a privacy nightmare. But they can also be incredibly valuable, offering real-time navigation, local weather, geographically targeted search engine results, and other useful functions.
A 2011 Microsoft survey, Location Usage & Perceptions, found that 94 percent of customers considered location-based services valuable. However, the same survey found that 52 percent were concerned about the privacy issues related to the use of geolocation data.
The privacy issue is now a focus of attention in the research community. “Today’s computing power and scale of publicly available data makes it easier to identify individuals from the data,” said Professor Xiaokui Xiao at Nanyang Technological University (NTU).
Recently, the collaboration between Professor Xiaokui Xiao’s team and Dr. Xing Xie’s group at Microsoft Research Asia in Beijing has found a way that might alleviate the privacy concerns. The team proposes a data manipulation technique, called PrivTree, which pre-processes geolocation data to protect individual privacy. Subsequently, the privatized data can be safely used in any prospective analysis, or even made publicly available, without further risk to an individual’s privacy.
PrivTree works by mathematically “blurring” the geolocation information of a specific individual, while maintaining overall accuracy for the dataset as a whole. In the example below, individuals in the dataset are projected onto a map by their geolocation coordinates.
Next, PrivTree goes through two phases to “blur out” the geolocation information of each individual.
Phase 1: Map Partitioning
Phase 2: Location Perturbation
This ends up with a new set of data points that follows a similar distribution to the original data, but the real location of each participant has been masked. The privatized data is then released as the output of PrivTree. PrivTree can be extended to support all kinds of location data – for example, your daily jogging route uploaded to a health app. The research paper, PrivTree: A Differentially Private Algorithm for Hierarchical Decompositions was accepted by ACM SIGMOD 2016, the world’s top data management conference.
Professor Xiao said this about collaborating with Microsoft researchers, “Microsoft Research Asia’s expertise in managing large sets of geolocation data, such as Beijing taxi data, played a crucial role to the success of this project. It helped us develop and test our model.”
Professor Xiao plans to further integrate PrivTree techniques into Microsoft’s location-based services to provide privacy protection. Dr. Xing Xie, Senior Researcher at Microsoft Research Asia, and a collaborator on this project, observed “Data privacy is a critical challenge in the cloud computing era, especially for user-generated location data that contains a lot of private knowledge about individuals. We hope this joint work can contribute to–and eventually lead to–a safer world for everyone.”