Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

Project PrivTree: Blurring your “where” for location privacy

January 20, 2017 | By Microsoft blog editor

By Winnie Cui, Senior Research Manager, Microsoft Research Asia

Data scientist, Anthony Tockar, used publicly available location data to show how celebrities can be tracked throughout New York City, while working on his Master’s Degree at Northwestern University. By cross-referencing public news and photos about celebrities hailing cabs in NYC, Tockar found out exactly where celebrities climbed into cabs, where they traveled and even how much they paid!

As this example shows, location-based services, pulling an individual’s location data from GPS, IP addresses and Wi-Fi network mapping, can be a privacy nightmare. But they can also be incredibly valuable, offering real-time navigation, local weather, geographically targeted search engine results, and other useful functions.

A 2011 Microsoft survey, Location Usage & Perceptions, found that 94 percent of customers considered location-based services valuable. However, the same survey found that 52 percent were concerned about the privacy issues related to the use of geolocation data.

The privacy issue is now a focus of attention in the research community. “Today’s computing power and scale of publicly available data makes it easier to identify individuals from the data,” said Professor Xiaokui Xiao at Nanyang Technological University (NTU).

Recently, the collaboration between Professor Xiaokui Xiao’s team and Dr. Xing Xie’s group at Microsoft Research Asia in Beijing has found a way that might alleviate the privacy concerns. The team proposes a data manipulation technique, called PrivTree, which pre-processes geolocation data to protect individual privacy. Subsequently, the privatized data can be safely used in any prospective analysis, or even made publicly available, without further risk to an individual’s privacy.

PrivTree works by mathematically “blurring” the geolocation information of a specific individual, while maintaining overall accuracy for the dataset as a whole. In the example below, individuals in the dataset are projected onto a map by their geolocation coordinates.

PrivTree geolocation example

Each marker represents an individual in the geolocation database.

Next, PrivTree goes through two phases to “blur out” the geolocation information of each individual.

Phase 1: Map Partitioning

The map is partitioned into a few sub-regions, based on the density of the data points.

Phase 2: Location Perturbation

Using statistical analysis, individuals are subjected to a perturbation scheme where they are randomly removed, added or shuffled to guarantee privacy while maintaining statistical accuracy. A new geolocation database is ready to use, after applying location perturbation to each sub-region.

This ends up with a new set of data points that follows a similar distribution to the original data, but the real location of each participant has been masked. The privatized data is then released as the output of PrivTree. PrivTree can be extended to support all kinds of location data – for example, your daily jogging route uploaded to a health app. The research paper, PrivTree: A Differentially Private Algorithm for Hierarchical Decompositions was accepted by ACM SIGMOD 2016, the world’s top data management conference.

Professor Xiao said this about collaborating with Microsoft researchers, “Microsoft Research Asia’s expertise in managing large sets of geolocation data, such as Beijing taxi data, played a crucial role to the success of this project. It helped us develop and test our model.”

Professor Xiao plans to further integrate PrivTree techniques into Microsoft’s location-based services to provide privacy protection. Dr. Xing Xie, Senior Researcher at Microsoft Research Asia, and a collaborator on this project, observed “Data privacy is a critical challenge in the cloud computing era, especially for user-generated location data that contains a lot of private knowledge about individuals. We hope this joint work can contribute to–and eventually lead to–a safer world for everyone.”

Learn more:

Up Next

Data management, analysis and visualization

Microsoft and Tsinghua University Work Together on Open Academic Data Research

In a recent collaboration, Microsoft and China’s Tsinghua University released an academic graph, named Open Academic Graph (OAG). This billion-scale academic graph integrates the current Microsoft Academic Graph (MAG) and Tsinghua’s AMiner academic graph. Specifically, it contains the metadata information of 155 million academic paper metadata from AMiner and 166 million papers from MAG. By […]

Microsoft blog editor

Algorithms, Data management, analysis and visualization, Security, privacy, and cryptography

Collecting telemetry data privately

The collection and analysis of telemetry data from users and their devices leads to improved user experiences and informed business decisions. However, users have concerns about their data privacy, including what personal information software and internet companies are gathering and whether their data is protected from potential leaks and hacks. Differential privacy (Dwork, et al. […]

Bolin Ding

Researcher

Data Science education at UC Berkeley

Data management, analysis and visualization

A new understanding of the world through grassroots Data Science education at UC Berkeley

By Vani Mandava, Director, Data Science, Microsoft Research While some may regard data science as an easy passport to a job for the tech savvy, Luis Macias has different ideas. The fourth-year undergraduate student, who is majoring in American Studies at University of California, Berkeley (UC Berkeley), wants to turn the hype of data science […]

Microsoft blog editor