Microsoft researchers enable secure data exchange in the cloud
By John Roach, Writer, Microsoft Research
In the future, machine learning algorithms may examine our genomes to determine our susceptibility to maladies such as heart disease and cancer. Between now and then, computer scientists need to train the algorithms on genetic data, bundles of which are increasingly stored encrypted and secure in the cloud along with financial records, vacation photos and other bits and bytes of digitized information.
And there the data sits, full of potential but ultimately of little use to anyone but its owner.
That’s because encrypted data must first be decrypted before it can be used. But decrypted data is vulnerable to malicious attacks, which creates a tradeoff between data usability and security.
New research from Microsoft aims to unlock the full value of encrypted data by using the cloud itself to perform secure data trades between multiple willing parties in a way that provides users full control over how much information the exchange reveals.
“What we are trying to do is keep the data private and, at the same time, get the value out of it,” says Ran Gilad-Bachrach, a researcher in the Cryptography Research group within Microsoft’s research organization and co-author of a paper released in June that describes the protocol, or set of rules, for this system to securely exchange data.
The exchange is based on the idea of a secure multiparty computation, where two or more parties agree to evaluate their data in a way that one or more of the parties gets a result but none of the parties learns anything about the others’ data, except for what can be inferred from the result.
The multiparty computation is akin to a group of employees who want to know where their individual salary ranks in relation to the group as a whole, but none of them wants to reveal their pay to the group.
One way to solve this problem is for each individual to tell their salary in confidence to a trusted colleague. This colleague calculates the average salary and shares the result with the group. Each employee can determine where their pay falls without learning what any individual is paid. The trusted colleague conveniently forgets everything.
“This secure data exchange emulates that, but without the need for the trusted colleague,” says paper co-author Peter Rindal, a PhD candidate at Oregon State University who is in his second internship at Microsoft and an expert on secure multiparty computation.
The cloud, according to the researchers, is a key feature of the exchange. It transforms a computation technique used to resolve water cooler disputes over pay to a secure system to train algorithms, perform market research, conduct auctions and enable new business opportunities.
Exchange in action
Here’s how it works:
Data owners – hundreds, thousands of them – encrypt their data and send it to the cloud for storage. Think of them as relatively passive sellers in the exchange. When an active buyer – usually one entity – comes along and wants to make a transaction with some of the sellers, those sellers approve the transaction by sending the buyer keys to the data.
But since those keys can decrypt the data stored in the cloud, the cloud can’t directly share the stored data with the buyer, otherwise security and privacy would be compromised.
“Instead, we want to use the keys to decrypt the data inside a multiparty computation,” says paper co-author Kim Laine, a post-doctoral researcher also in the Cryptography Research group who studies how to compute on encrypted data. Doing so unencrypts the data for a computation “without actually revealing anything to anyone except the result” of the computation.
All of the computation is performed in the cloud, and the computation itself is encrypted in such a way that not even the cloud knows what is being computed, which protects any of the buyer’s data used in the computation such as a proprietary algorithm. If everything goes as expected, the cloud reveals the decrypted results to the interested parties.
Set up this way, according to the researchers, the data exchange is secure provided that the cloud itself follows the rules and nothing more.
Test driving data
Here’s another advantage to the system: It’s costly to purchase data, and researchers with limited budgets need to make sure it is worth it. The exchange, Gilad-Bachrach explains, offers a way for a buyer to “test drive” a portion of the sellers’ data and thus make an informed decision over whether to buy the keys to unlock the full dataset.
Consider researchers at a pharmaceutical company who are developing a machine learning model that combs through genomes to determine individuals’ risk of various diseases. To improve the model and further study it, the researchers are interested in buying access to a medical center’s bundle of anonymized patient genomes, but only if the bundle contains distinctly different data than what the researchers have already used.
“We call this ‘can we test drive your data’ because why would you buy anything without knowing what you are buying,” says Laine. “But the problem with data is you can’t just show it.”
The secure data exchange system allows the researchers to perform a statistical analysis on a portion of the medical center’s anonymized genetic data that reveals how much it differs from the data already used to build the disease-prediction algorithm. After this test drive, the researchers can decide whether to buy the keys to the full bundle.
“What we are trying to build,” Gilad-Bachrach says, “is a mechanism by which you can say, ‘Look, I am interested in your data, but I want to verify it is really what I need before I purchase it.’”
Real world applications
In another use of the exchange, a medical center could compare the outcomes of its treatment plan for pneumonia with the outcomes of treatment plans used at other medical centers without any one medical center revealing what treatment plan it uses. That avoids the risk of getting called out for using a less effective treatment.
Individuals could even use the exchange as a marketplace to sell researchers access to their encrypted genetic data for algorithm training. Ultimately, Laine notes, the researchers might develop an algorithm that uses the exchange to communicate to participants whether or not their genome contains a specific mutation related to a health concern such as heart disease or cancer.
“If you are a match,” notes Laine, “you can decide if you want to contact the research group.”
It’s a research project for now. But the team aims to publicly release the library, or tools, needed to implement the secure data exchange in the near future.