Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

Project PrivTree: Blurring your “where” for location privacy

January 20, 2017 | By Microsoft blog editor

By Winnie Cui, Senior Research Manager, Microsoft Research Asia

Data scientist, Anthony Tockar, used publicly available location data to show how celebrities can be tracked throughout New York City, while working on his Master’s Degree at Northwestern University. By cross-referencing public news and photos about celebrities hailing cabs in NYC, Tockar found out exactly where celebrities climbed into cabs, where they traveled and even how much they paid!

As this example shows, location-based services, pulling an individual’s location data from GPS, IP addresses and Wi-Fi network mapping, can be a privacy nightmare. But they can also be incredibly valuable, offering real-time navigation, local weather, geographically targeted search engine results, and other useful functions.

A 2011 Microsoft survey, Location Usage & Perceptions, found that 94 percent of customers considered location-based services valuable. However, the same survey found that 52 percent were concerned about the privacy issues related to the use of geolocation data.

The privacy issue is now a focus of attention in the research community. “Today’s computing power and scale of publicly available data makes it easier to identify individuals from the data,” said Professor Xiaokui Xiao at Nanyang Technological University (NTU).

Recently, the collaboration between Professor Xiaokui Xiao’s team and Dr. Xing Xie’s group at Microsoft Research Asia in Beijing has found a way that might alleviate the privacy concerns. The team proposes a data manipulation technique, called PrivTree, which pre-processes geolocation data to protect individual privacy. Subsequently, the privatized data can be safely used in any prospective analysis, or even made publicly available, without further risk to an individual’s privacy.

PrivTree works by mathematically “blurring” the geolocation information of a specific individual, while maintaining overall accuracy for the dataset as a whole. In the example below, individuals in the dataset are projected onto a map by their geolocation coordinates.

PrivTree geolocation example

Each marker represents an individual in the geolocation database.

Next, PrivTree goes through two phases to “blur out” the geolocation information of each individual.

Phase 1: Map Partitioning

The map is partitioned into a few sub-regions, based on the density of the data points.

Phase 2: Location Perturbation

Using statistical analysis, individuals are subjected to a perturbation scheme where they are randomly removed, added or shuffled to guarantee privacy while maintaining statistical accuracy. A new geolocation database is ready to use, after applying location perturbation to each sub-region.

This ends up with a new set of data points that follows a similar distribution to the original data, but the real location of each participant has been masked. The privatized data is then released as the output of PrivTree. PrivTree can be extended to support all kinds of location data – for example, your daily jogging route uploaded to a health app. The research paper, PrivTree: A Differentially Private Algorithm for Hierarchical Decompositions was accepted by ACM SIGMOD 2016, the world’s top data management conference.

Professor Xiao said this about collaborating with Microsoft researchers, “Microsoft Research Asia’s expertise in managing large sets of geolocation data, such as Beijing taxi data, played a crucial role to the success of this project. It helped us develop and test our model.”

Professor Xiao plans to further integrate PrivTree techniques into Microsoft’s location-based services to provide privacy protection. Dr. Xing Xie, Senior Researcher at Microsoft Research Asia, and a collaborator on this project, observed “Data privacy is a critical challenge in the cloud computing era, especially for user-generated location data that contains a lot of private knowledge about individuals. We hope this joint work can contribute to–and eventually lead to–a safer world for everyone.”

Learn more:

Up Next

Artificial intelligence, Data platforms and analytics

Calling all aspiring women in Data Science

What started as a one-day conference organized by Stanford University in 2015, Women in Data Science (WiDS) has blossomed into a movement bringing together women data scientists and aspiring data scientists via a series of over 150 virtual and in-person events worldwide, ultimately culminating in the March 4, 2019 main event at Stanford. Microsoft is […]

Vani Mandava

Director, Data Science Outreach

Artificial intelligence, Data platforms and analytics

Cloud computing aids researchers in solving the unsolvable in medical data labeling

It’s not uncommon for physicians to disagree about a diagnosis. That’s why people often seek a second or third opinion when faced with a serious or complex health concern. What if instead of a second opinion, hundreds of expert opinions could be collated? What if those experts were a combination of both humans and AI […]

Vani Mandava

Director, Data Science Outreach

Algorithms, Data platforms and analytics, Security, privacy, and cryptography

Collecting telemetry data privately

The collection and analysis of telemetry data from users and their devices leads to improved user experiences and informed business decisions. However, users have concerns about their data privacy, including what personal information software and internet companies are gathering and whether their data is protected from potential leaks and hacks. Differential privacy (Dwork, et al. […]

Bolin Ding