{"id":793088,"date":"2021-11-09T08:35:01","date_gmt":"2021-11-09T16:35:01","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=793088"},"modified":"2021-11-09T14:32:28","modified_gmt":"2021-11-09T22:32:28","slug":"privacy-preserving-machine-learning-maintaining-confidentiality-and-preserving-trust","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/privacy-preserving-machine-learning-maintaining-confidentiality-and-preserving-trust\/","title":{"rendered":"Privacy Preserving Machine Learning: Maintaining confidentiality and preserving trust"},"content":{"rendered":"\n<figure class=\"wp-block-image alignwide size-large\"><img decoding=\"async\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/11\/1400x788_PPML_graphic_no_logo_simplified_no_icons-scaled.jpg\" alt=\"A diagram with the title: Privacy Preserving Machine Learning: A holistic approach to protecting privacy. In the center of the diagram, there is a circle with the word \u201cTrust.\u201d There are five callouts coming from the circle. Moving clockwise, they are: Privacy & confidentiality; Transparency; Empower innovation; Security; Current & upcoming policies and regulations.\"\/><\/figure>\n\n\n\n<p>Machine learning (ML) offers tremendous opportunities to increase productivity. However, ML systems are only as good as the quality of the data that informs the training of ML models. And training ML models requires a significant amount of data, more than a single individual or organization can contribute. By sharing data to collaboratively train ML models, we can unlock value and develop powerful language models that are applicable to a wide variety of scenarios, such as <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/insider.office.com\/en-us\/blog\/text-predictions-in-word-outlook\">text prediction<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/msai\/articles\/assistive-ai-makes-replying-easier-2\/\">email reply suggestions<\/a>. At the same time, we recognize the need to preserve the confidentiality and privacy of individuals and earn and maintain the trust of the people who use our products. Protecting the confidentiality of our customers\u2019 data is core to our mission. This is why we\u2019re excited to share the work we\u2019re doing as part of the <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/privacy-preserving-machine-learning-innovation\/\">Privacy Preserving Machine Learning<\/a> (PPML) initiative.<\/p>\n\n\n\n<p>The PPML initiative was started in partnership between Microsoft Research and Microsoft product teams with the objective of protecting the confidentiality and privacy of customer data when training large-capacity language models. The goal of the PPML initiative is to improve existing techniques and develop new ones for protecting sensitive information that work for both individuals and enterprises. This helps ensure that our use of data protects people\u2019s privacy and the data is utilized in a safe fashion, avoiding leakage of confidential and private information.<\/p>\n\n\n\n<p>This blog post discusses emerging research on combining techniques to ensure privacy and confidentiality when using sensitive data to train ML models. We illustrate how employing PPML can support our ML pipelines in meeting stringent privacy requirements and that our researchers and engineers have the tools they need to meet these requirements. We also discuss how applying best practices in PPML enables us to be transparent about how customer data is applied.<\/p>\n\n\n\n<h2 id=\"a-holistic-approach-to-ppml\">A holistic approach to PPML<\/h2>\n\n\n\n<p>Recent research has shown that deploying ML models can, in some cases, implicate privacy in unexpected ways. For example, pretrained public language models that are fine-tuned on private data can be <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/analyzing-information-leakage-of-updates-to-natural-language-models\/\">misused to recover private information<\/a>, and very large language models have been shown to <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/abs\/2012.07805\">memorize training examples<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, potentially encoding personally identifying information (PII). Finally, inferring that a specific user was part of the training data can also <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/ieeexplore.ieee.org\/document\/7958568\">impact privacy<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. Therefore, we believe it&#8217;s critical to apply multiple techniques to achieve privacy and confidentiality; no single method can address all aspects alone. This is why we take a three-pronged approach to PPML: understanding the risks and requirements around privacy and confidentiality, measuring the risks, and mitigating the potential for breaches of privacy. We explain the details of this multi-faceted approach below.<\/p>\n\n\n\n<p><strong>Understand<\/strong>: We work to understand the risk of customer data leakage and potential privacy attacks in a way that helps determine confidentiality properties of ML pipelines. In addition, we believe it\u2019s critical to proactively align with policy makers. We take into account local and international laws and guidance regulating data privacy, such as the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/gdpr-info.eu\/\">General Data Protection Regulation<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (GDPR) and the EU\u2019s policy on <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/digital-strategy.ec.europa.eu\/en\/policies\/european-approach-artificial-intelligence\">trustworthy AI<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. We then map these legal principles, our contractual obligations, and responsible AI principles to our technical requirements and develop tools to communicate with policy makers how we meet these requirements.<\/p>\n\n\n\n<p><strong>Measure<\/strong>: Once we understand the risks to privacy and the requirements we must adhere to, we define metrics that can quantify the identified risks and track success towards mitigating them.<\/p>\n\n\n\n<p><strong>Mitigate<\/strong>: We then develop and apply mitigation strategies, such as differential privacy (DP), described in more detail later in this blog post. After we apply mitigation strategies, we measure their success and use our findings to refine our PPML approach.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"344\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/11\/PPML_Edited_Diagram-1-1024x344.png\" alt=\"A diagram that depicts the three-pronged approach to PPML, defined as: Understand, then Measure, and then Mitigate. Specific details under each approach are detailed. Under Understand, the three bullet items are: 1) conduct threat modeling and attack research, 2) identify confidentiality properties and guarantees, and 3) understand regulatory requirements. Under Measure, the two bullet items are: 1) capture vulnerabilities quantitatively, and 2) develop and apply frameworks to monitor risks and mitigation success. Under Mitigate, the two bullet items are: 1) develop and apply techniques to reduce privacy risks, and 2) meet legal and compliance rules and regulations.\" class=\"wp-image-793268\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/11\/PPML_Edited_Diagram-1-1024x344.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/11\/PPML_Edited_Diagram-1-300x101.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/11\/PPML_Edited_Diagram-1-768x258.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/11\/PPML_Edited_Diagram-1-1536x517.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/11\/PPML_Edited_Diagram-1-2048x689.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/11\/PPML_Edited_Diagram-1-240x81.png 240w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>PPML is informed by a three-pronged approach: 1) understanding the risk and regulatory requirements, 2) measuring the vulnerability and success of attacks, and 3) mitigating the risk.<\/figcaption><\/figure><\/div>\n\n\n\n<h2 id=\"ppml-in-practice\">PPML in practice<\/h2>\n\n\n\n<p>Several different technologies contribute to PPML, and we implement them for a number of different use cases, including threat modeling and preventing the leakage of training data. For example, in the following <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/insider.office.com\/en-us\/blog\/text-predictions-in-word-outlook\">text-prediction<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> scenario, we took a holistic approach to preserving data privacy and collaborated across Microsoft Research and product teams, layering multiple PPML techniques and developing quantitative metrics for risk assessment.<\/p>\n\n\n\n<p>We recently developed a personalized assistant for composing messages and documents by using the latest natural language generation models, developed by <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/turing.microsoft.com\/\">Project Turing<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. Its transformer-based architecture uses attention mechanisms to predict the end of a sentence based on the current text and other features, such as the recipient and subject. Using large transformer models is risky in that individual training examples can be memorized and reproduced when making predictions, and these examples can contain sensitive data. As such, we developed a strategy to both identify and remove potentially sensitive information from the training data, and we took steps to mitigate memorization tendencies in the training process. We combined careful sampling of data, PII scrubbing, and DP model training (discussed in more detail below).<\/p>\n\n\n\n<h2 id=\"mitigating-leakage-of-private-information\">Mitigating leakage of private information<\/h2>\n\n\n\n<p>We use security best practices to help protect customer data, including strict eyes-off handling by data scientists and ML engineers. Still, such mitigations cannot prevent subtler methods of privacy leakage, such as training data memorization in a model that could subsequently be extracted and linked to a user. That is why we employ state-of-the-art privacy protections provided by DP and continue to contribute to the cutting-edge research in this field. For privacy-impacting use cases, our policies require a security review, a privacy review, and a compliance review, each including domain-specific quantitative risk assessments and application of appropriate mitigations.<\/p>\n\n\n\n<h3 id=\"differential-privacy\">Differential privacy<\/h3>\n\n\n\n<p>Microsoft <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/www.eatcs.org\/index.php\/component\/content\/article\/1-news\/2450-2017-godel-prize\">pioneered DP research back in 2006<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, and DP has since been established as the de facto privacy standard, with a vast body of academic literature and a growing number of large-scale deployments across the industry (e.g., <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/collecting-telemetry-data-privately\/\">DP in Windows<\/a> telemetry or <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/docs.microsoft.com\/en-us\/viva\/insights\/Privacy\/differential-privacy\">DP in Microsoft Viva Insights<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>) and government. In ML scenarios, DP works by adding small amounts of statistical noise during training, the purpose of which is to conceal the contributions of individual parties whose data is being used. When DP is employed, a mathematical proof validates that the final ML model learns only general trends in the data without acquiring information unique to any specific party. Differentially private computations entail the notion of a privacy budget, \u03f5, which imposes a strict upper bound on information that might leak from the process. This guarantees that no matter what auxiliary information an external adversary may possess, their ability to learn something new about any individual party whose data was used in training from the model is severely limited.<\/p>\n\n\n\n<p>In recent years, we have been pushing the boundaries in DP research with the overarching goal of providing Microsoft customers with the best possible productivity experiences through improved ML models for natural language processing (NLP) while providing highly robust privacy protections.<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>In&nbsp;the&nbsp;Microsoft Research papers&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/differentially-private-set-union\/\" target=\"_blank\" rel=\"noreferrer noopener\">Differentially Private Set Union<\/a>&nbsp;and&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/differentially-private-n-gram-extraction\/\" target=\"_blank\" rel=\"noreferrer noopener\">Differentially private n-gram extraction<\/a>, we developed new algorithms for exposing frequent items,&nbsp;such as unigrams or n-grams coming from customer data,&nbsp;while adhering to the stringent guarantees of&nbsp;DP. Our algorithms have been deployed in production to improve systems such as&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/msai\/articles\/assistive-ai-makes-replying-easier-2\/\" target=\"_blank\" rel=\"noreferrer noopener\">assisted&nbsp;response generation<\/a>.&nbsp;<\/li><li>In the Microsoft Research paper&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/numerical-composition-of-differential-privacy\/\" target=\"_blank\" rel=\"noreferrer noopener\">Numerical Composition of Differential Privacy<\/a>, we developed a new DP accountant that gives a more accurate result for the expended&nbsp;privacy budget&nbsp;when training on customer data. This is particularly important when training on enterprise data where typically&nbsp;many&nbsp;fewer participants are present in the dataset. With the new&nbsp;DP&nbsp;accountant,&nbsp;we can train models for longer,&nbsp;thereby achieving higher utility while using the same privacy budget.&nbsp;&nbsp;<\/li><\/ul>\n\n\n\n<ul class=\"wp-block-list\"><li>Finally, in our recent paper&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/differentially-private-fine-tuning-of-language-models\/\" target=\"_blank\" rel=\"noreferrer noopener\">Differentially private fine-tuning of language models<\/a>,&nbsp;we demonstrate that one can privately fine-tune very large foundation NLP models,&nbsp;such as GPT-2,&nbsp;nearly&nbsp;matching the accuracy of nonprivate fine-tuning. Our results build on recent&nbsp;advances in parameter-efficient fine-tuning methods and our earlier work on improved accounting for&nbsp;privacy.&nbsp;<\/li><\/ul>\n\n\n\n<p>When training or fine-tuning machine learning models on customer content, we adhere to strict policy regarding the privacy budget<sup><a id=\"r1\" href=\"#fn1\">[1]<\/a><\/sup>. <\/p>\n\n\n\n<h3 id=\"threat-modeling-and-leakage-analysis\">Threat modeling and leakage analysis<\/h3>\n\n\n\n<p>Even though DP is considered the gold standard for mitigation, we go one step further and perform threat modeling to study the actual risk before and after mitigation. Threat modeling considers the possible ways an ML system can be attacked. We have implemented threat modeling by studying realistic and relevant attacks, such as the tab attack (discussed below) in a black box setting, and we have considered and implemented novel attack angles that are very relevant to production models, such as the model update attack. We study attacks that go beyond the extraction of training data and approximate more abstract leakage, like attribute inference. Once we have established threat models, we use those attacks to define privacy metrics. We then work to make sure all of these attacks are mitigated, and we continuously monitor their success rates. Read further to learn about some of the threat models and leakage analyses we use as part of our PPML initiative.<\/p>\n\n\n\n<p><strong>Model update&nbsp;attacks<\/strong>. In the paper&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/analyzing-information-leakage-of-updates-to-natural-language-models\/\" target=\"_blank\" rel=\"noreferrer noopener\">Analyzing&nbsp;Information Leakage of Updates to Natural Language Models<\/a>,&nbsp;a&nbsp;Microsoft Research team&nbsp;introduced&nbsp;a new threat model where multiple snapshots of&nbsp;a&nbsp;model are accessible to&nbsp;a&nbsp;user,&nbsp;such as predictive keyboards.&nbsp;They&nbsp;proposed&nbsp;using model update&nbsp;attacks to&nbsp;analyze&nbsp;leakage in practical settings,&nbsp;where language models are frequently&nbsp;updated&nbsp;by adding&nbsp;new data,&nbsp;fine-tuning&nbsp;public pre-trained language models on private data,&nbsp;or&nbsp;by&nbsp;deleting user&nbsp;data&nbsp;to&nbsp;comply with&nbsp;privacy law&nbsp;requirements.&nbsp;The&nbsp;results&nbsp;showed&nbsp;that access to such&nbsp;snapshots&nbsp;can&nbsp;leak phrases that were used to&nbsp;update&nbsp;the model.&nbsp;Based on the attack, leakage analyses of text prediction models can be performed&nbsp;without the need to monitor it.&nbsp;<\/p>\n\n\n\n<p><strong>Tab&nbsp;attacks<\/strong>.&nbsp;Tab&nbsp;attacks can occur when an attacker has access to top-1 predictions of a language model, and&nbsp;the&nbsp;text auto-completion feature, in an email app for example,&nbsp;is&nbsp;applied&nbsp;by pressing the Tab key.&nbsp;It\u2019s&nbsp;well known that large language models can memorize individual training instances, and&nbsp;recent work has&nbsp;demonstrated&nbsp;that&nbsp;practical attacks extracting verified training instances from GPT-2&nbsp;is a risk.&nbsp;In the&nbsp;paper&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/arxiv.org\/abs\/2101.05405\" target=\"_blank\" rel=\"noopener noreferrer\">Training Data Leakage Analysis in Language Models<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, a team of Microsoft researchers&nbsp;established&nbsp;an&nbsp;approach to vetting a language model for training data leakage.&nbsp;This approach&nbsp;enables the model builder to establish&nbsp;the extent to which training examples can&nbsp;be extracted from the model&nbsp;using a practical attack.&nbsp;The model&nbsp;owner can use&nbsp;this method&nbsp;to&nbsp;verify that mitigations are performing as expected and determine&nbsp;whether&nbsp;a model is safe&nbsp;to deploy.&nbsp;<\/p>\n\n\n\n<p><strong>Poisoning attacks<\/strong>.&nbsp;In the paper&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/arxiv.org\/pdf\/2101.11073.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">Property Inference from Poisoning<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, Microsoft researchers&nbsp;and an affiliated academic&nbsp;considered&nbsp;the consequences&nbsp;of&nbsp;a scenario where some of the training data is intentionally manipulated to cause more privacy leakage.&nbsp;This&nbsp;type of data compromise can occur,&nbsp;for example,&nbsp;in&nbsp;a&nbsp;collaborative&nbsp;learning setting where data from several parties or tenants&nbsp;are&nbsp;combined to achieve a better model&nbsp;and&nbsp;one of the&nbsp;parties is behaving&nbsp;dishonestly.&nbsp;The&nbsp;paper illustrates&nbsp;how such a party can manipulate their data to&nbsp;extract&nbsp;aggregate statistics about the rest of the training set.&nbsp;In this case,&nbsp;several&nbsp;parties&nbsp;pool their data to train a spam classifier. If&nbsp;one of those parties&nbsp;has malicious&nbsp;intent,&nbsp;it&nbsp;can use the model to&nbsp;obtain&nbsp;the&nbsp;average sentiment of the emails&nbsp;in the rest of the training set,&nbsp;demonstrating the&nbsp;need to&nbsp;take particular care&nbsp;to&nbsp;ensure that&nbsp;the data used in such joint training scenarios is&nbsp;trustworthy.&nbsp;<\/p>\n\n\n\n<h2 id=\"future-areas-of-focus-for-ppml\">Future areas of focus for PPML<\/h2>\n\n\n\n<p>As we continue to apply and refine our PPML processes with the intent of further enhancing privacy guarantees, we recognize that the more we learn, the larger the scope becomes for addressing privacy concerns across the entire pipeline. We will continue focusing on:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Following&nbsp;regulations&nbsp;around privacy and confidentiality&nbsp;<\/li><li>Proving privacy properties for each step of the training pipeline&nbsp;<\/li><li>Making privacy technology more accessible to product teams&nbsp;<\/li><li>Applying decentralized learning&nbsp;<\/li><li>Investigating training&nbsp;algorithms for private federated learning, combining causal and federated learning, using federated reinforcement learning principles, federated optimization, and more<\/li><li>Using&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/learning-with-weak-supervision\/\">weakly&nbsp;supervised learning<\/a>&nbsp;technologies to enable model development without direct access to the data<\/li><\/ul>\n\n\n\n<h3 id=\"decentralized-learning-federated-learning-and-its-potential\">Decentralized learning: Federated learning and its potential<\/h3>\n\n\n\n<p>With users becoming more concerned about how their data is handled, and with increasingly stronger regulations, users are applying ever more rigorous controls in how they process and store data. As such, increasingly more data is stored in inaccessible locations or on user devices without the option of curating for centralized training.<\/p>\n\n\n\n<p>To this end, the&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/arxiv.org\/abs\/1912.04977\" target=\"_blank\" rel=\"noopener noreferrer\">federated&nbsp;learning<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>&nbsp;(FL) paradigm has been proposed, addressing privacy concerns while&nbsp;continuing to&nbsp;process such inaccessible data.&nbsp;The proposed approach aims to train ML models, for example, deep neural networks, on data found in local worker nodes,&nbsp;such as data silos or user devices, without any raw data leaving the node. A central coordinator dispatches a copy of the model to the nodes, which individually computes&nbsp;a local update.&nbsp;The updates are&nbsp;then&nbsp;communicated back to the coordinator where they are federated, for example, by averaging across the updates. The promise of FL is that raw training data remains within its local node. However, this might&nbsp;not mitigate all privacy risks,&nbsp;and additional mitigations, such as DP,&nbsp;are usually required.&nbsp;<\/p>\n\n\n\n<h3 id=\"secure-and-confidential-computing-environments\">Secure and confidential computing environments<\/h3>\n\n\n\n<p>When dealing with highly private data, our customers may hesitate to bring their data to the cloud at all. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/azure.microsoft.com\/en-us\/solutions\/confidential-compute\/\">Azure confidential computing<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> uses trusted execution environments (TEEs), backed by hardware security guarantees, to enable data analytics and ML algorithms to be computed on private data with the guarantee that cloud administrators, malicious actors that breach the cloud tenancy boundary, and even the cloud provider itself cannot gain access to the data. This enables the collaboration of multiple customers on private data without the need to trust the cloud provider.<\/p>\n\n\n\n<p>While TEEs&nbsp;leverage specific&nbsp;hardware&nbsp;for security guarantees, cryptographic&nbsp;secure computing&nbsp;solutions,&nbsp;such as&nbsp;secure&nbsp;multi-party&nbsp;computation (MPC)&nbsp;and&nbsp;fully&nbsp;homomorphic&nbsp;encryption (FHE), can enable&nbsp;data to be processed&nbsp;under a layer&nbsp;of strong encryption. MPC refers to a set of&nbsp;cryptographic protocols&nbsp;that allows multiple parties to compute functions&nbsp;on their&nbsp;joint private&nbsp;inputs without revealing anything other than the output of the function to each other.&nbsp;FHE refers to&nbsp;a&nbsp;special type of encryption that allows computing to&nbsp;be&nbsp;done directly on encrypted data so that only the owner of the secret decryption key can reveal the result of the computation.&nbsp;Microsoft has developed one of the most popular FHE libraries,&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/Microsoft\/SEAL\">Microsoft SEAL<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.&nbsp;&nbsp;<\/p>\n\n\n\n<p>However, both&nbsp;MPC and FHE&nbsp;have seen&nbsp;only&nbsp;limited&nbsp;use&nbsp;due to their computational performance overhead and lack of developer tooling for&nbsp;nonexperts.&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/ezpc-easy-secure-multi-party-computation\/\" target=\"_blank\" rel=\"noreferrer noopener\">Easy Secure Multi-Party Computation<\/a>&nbsp;(EzPC)&nbsp;is an end-to-end&nbsp;MPC&nbsp;system that&nbsp;solves these two&nbsp;challenges. It&nbsp;takes as input standard TensorFlow&nbsp;or&nbsp;ONNX code for ML inference and outputs MPC protocols that are highly performant.&nbsp;EzPC enables the use of state-of-the-art ML algorithms for inference tasks. Experimentally, this technology&nbsp;has been&nbsp;recently&nbsp;applied&nbsp;to&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/secure-medical-image-analysis-with-cryptflow\/\" target=\"_blank\" rel=\"noreferrer noopener\">secure medical image analysis<\/a>&nbsp;and&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/arxiv.org\/abs\/2107.10230\" target=\"_blank\" rel=\"noopener noreferrer\">secure medical imaging AI validation<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>&nbsp;research software,&nbsp;successfully demonstrating&nbsp;the&nbsp;EzPC system\u2019s&nbsp;ability to execute the algorithms without accessing the underlying data.&nbsp;<\/p>\n\n\n\n<h3 id=\"broader-opportunities-for-ppml\">Broader opportunities for PPML<\/h3>\n\n\n\n<p>Advances in technology can present tremendous opportunities along with potentially equally significant risks. We aim to create leading-edge tools for realizing technology ethics from principle to practice, engage at the intersection of technology and policy, and work to ensure that the continued advancement of technology is responsible, privacy protective, and beneficial to society.<\/p>\n\n\n\n<p>However, even with the technologies discussed above, there continue to be outstanding questions in the PPML space. For example, can we arrive at tighter theoretical bounds for DP training and enable improved privacy-utility trade-offs? Will we be able to train ML models from synthetic data in the future? Finally, can we tightly integrate privacy and confidentiality guarantees into the design of the next generation of deep learning models?<\/p>\n\n\n\n<p>At Microsoft Research, we\u2019re working to answer these questions and deliver the best productivity experiences afforded by the sharing of data to train ML models while preserving the privacy and confidentiality of data. Please visit our <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/privacy-preserving-machine-learning-innovation\/\">Privacy Preserving Machine Learning Group<\/a> page and learn more about the holistic approach we\u2019re taking to unlock the full potential of enterprise data for intelligent features while honoring our commitment to keep customer data private.<\/p>\n\n\n\n<p>For the latest discussions and developments on privacy and security, you can view parts <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/researchsummit.microsoft.com\/sessions\/details\/bc7bbd43-4f47-4583-b3f7-58fa411dfb7e\">1<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/researchsummit.microsoft.com\/sessions\/details\/2b78cd68-6da1-49e6-bf3a-cdd0a885051e\">2<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> of the Future of Privacy and Security track on-demand from <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/researchsummit.microsoft.com\/home\">Microsoft Research Summit<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n<h3 id=\"appendix\">Appendix<\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>If you\u2019re interested in learning more\u00a0about the different ways Microsoft protects your data, please visit the\u00a0<a href=\"https:\/\/www.microsoft.com\/en-us\/trust-center\/privacy\" target=\"_blank\" rel=\"noreferrer noopener\">Microsoft Trust Center<\/a>.\u00a0\u00a0<\/li><li>Read more about how Microsoft approaches\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/docs.microsoft.com\/en-us\/microsoft-365\/compliance\/office-365-encryption-in-the-microsoft-cloud-overview?view=o365-worldwide\" target=\"_blank\" rel=\"noopener noreferrer\">encryption in the cloud<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.\u00a0<\/li><li>Learn about\u00a0the\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/servicetrust.microsoft.com\/ViewPage\/TrustDocumentsV3?command=Download&downloadType=Document&downloadId=ede6342e-d641-4a9b-9162-7d66025003b0&tab=7f51cb60-3d6c-11e9-b2af-7bb9f5d2d913&docTab=7f51cb60-3d6c-11e9-b2af-7bb9f5d2d913_Subprocessor_List\" target=\"_blank\" rel=\"noopener noreferrer\">data protection resources<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0Microsoft provides its customers.\u00a0<\/li><\/ul>\n\n\n\n<p class=\"has-small-font-size\"><hr>\n<p id=\"fn1\"><a href=\"#r1\">[1]<\/a> The maximum amount of privacy budget that can be consumed from each party whose data is involved in training a model over a period of six months is limited to \u03f5=4.<br><\/p><\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Machine learning (ML) offers tremendous opportunities to increase productivity. However, ML systems are only as good as the quality of the data that informs the training of ML models. And training ML models requires a significant amount of data, more than a single individual or organization can contribute. By sharing data to collaboratively train ML [&hellip;]<\/p>\n","protected":false},"author":40735,"featured_media":793244,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_hide_image_in_river":0,"footnotes":""},"categories":[1],"tags":[],"research-area":[13556,13558],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-793088","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-research-area-security-privacy-cryptography","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199561],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[559983,644373,761911,793670,1054512],"related-projects":[675777,556311,507611],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Victor Ruehle","user_id":41027,"display_name":"Victor Ruehle","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/virueh\/\" aria-label=\"Visit the profile page for Victor Ruehle\">Victor Ruehle<\/a>","is_active":false,"last_first":"Ruehle, Victor","people_section":0,"alias":"virueh"},{"type":"user_nicename","value":"Robert Sim","user_id":36650,"display_name":"Robert Sim","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/rsim\/\" aria-label=\"Visit the profile page for Robert Sim\">Robert Sim<\/a>","is_active":false,"last_first":"Sim, Robert","people_section":0,"alias":"rsim"},{"type":"user_nicename","value":"Sergey Yekhanin","user_id":34990,"display_name":"Sergey Yekhanin","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/yekhanin\/\" aria-label=\"Visit the profile page for Sergey Yekhanin\">Sergey Yekhanin<\/a>","is_active":false,"last_first":"Yekhanin, Sergey","people_section":0,"alias":"yekhanin"},{"type":"user_nicename","value":"Nishanth Chandran","user_id":33084,"display_name":"Nishanth Chandran","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/nichandr\/\" aria-label=\"Visit the profile page for Nishanth Chandran\">Nishanth Chandran<\/a>","is_active":false,"last_first":"Chandran, Nishanth","people_section":0,"alias":"nichandr"},{"type":"user_nicename","value":"Melissa Chase","user_id":32878,"display_name":"Melissa Chase","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/melissac\/\" aria-label=\"Visit the profile page for Melissa Chase\">Melissa Chase<\/a>","is_active":false,"last_first":"Chase, Melissa","people_section":0,"alias":"melissac"},{"type":"user_nicename","value":"Daniel Jones","user_id":41030,"display_name":"Daniel Jones","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jonesdaniel\/\" aria-label=\"Visit the profile page for Daniel Jones\">Daniel Jones<\/a>","is_active":false,"last_first":"Jones, Daniel","people_section":0,"alias":"jonesdaniel"},{"type":"user_nicename","value":"Kim Laine","user_id":32546,"display_name":"Kim Laine","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/kilai\/\" aria-label=\"Visit the profile page for Kim Laine\">Kim Laine<\/a>","is_active":false,"last_first":"Laine, Kim","people_section":0,"alias":"kilai"},{"type":"user_nicename","value":"Boris K\u00f6pf","user_id":37857,"display_name":"Boris K&ouml;pf","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/bokoepf\/\" aria-label=\"Visit the profile page for Boris K&ouml;pf\">Boris K&ouml;pf<\/a>","is_active":false,"last_first":"K\u00f6pf, Boris","people_section":0,"alias":"bokoepf"},{"type":"user_nicename","value":"Jaime Teevan","user_id":33975,"display_name":"Jaime Teevan","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/teevan\/\" aria-label=\"Visit the profile page for Jaime Teevan\">Jaime Teevan<\/a>","is_active":false,"last_first":"Teevan, Jaime","people_section":0,"alias":"teevan"},{"type":"guest","value":"jim-kleewein","user_id":"786892","display_name":"Jim Kleewein","author_link":"<a href=\"https:\/\/www.linkedin.com\/in\/jim-kleewein-2395a3\" aria-label=\"Visit the profile page for Jim Kleewein\">Jim Kleewein<\/a>","is_active":true,"last_first":"Kleewein, Jim","people_section":0,"alias":"jim-kleewein"},{"type":"user_nicename","value":"Saravan Rajmohan","user_id":41039,"display_name":"Saravan Rajmohan","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/saravar\/\" aria-label=\"Visit the profile page for Saravan Rajmohan\">Saravan Rajmohan<\/a>","is_active":false,"last_first":"Rajmohan, Saravan","people_section":0,"alias":"saravar"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/11\/1400x788_PPML_graphic_no_logo_simplified_no_icons-scaled-960x540.jpg\" class=\"img-object-cover\" alt=\"Graphic shows framework of Privacy Preserving Machine Learning\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/11\/1400x788_PPML_graphic_no_logo_simplified_no_icons-scaled-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/11\/1400x788_PPML_graphic_no_logo_simplified_no_icons-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/11\/1400x788_PPML_graphic_no_logo_simplified_no_icons-1024x577.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/11\/1400x788_PPML_graphic_no_logo_simplified_no_icons-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/11\/1400x788_PPML_graphic_no_logo_simplified_no_icons-1536x865.jpg 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/11\/1400x788_PPML_graphic_no_logo_simplified_no_icons-2048x1153.jpg 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/11\/1400x788_PPML_graphic_no_logo_simplified_no_icons-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/11\/1400x788_PPML_graphic_no_logo_simplified_no_icons-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/11\/1400x788_PPML_graphic_no_logo_simplified_no_icons-343x193.jpg 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/11\/1400x788_PPML_graphic_no_logo_simplified_no_icons-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/11\/1400x788_PPML_graphic_no_logo_simplified_no_icons-scaled-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/11\/1400x788_PPML_graphic_no_logo_simplified_no_icons-1280x720.jpg 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/11\/1400x788_PPML_graphic_no_logo_simplified_no_icons-1920x1080.jpg 1920w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"","formattedDate":"November 9, 2021","formattedExcerpt":"Machine learning (ML) offers tremendous opportunities to increase productivity. However, ML systems are only as good as the quality of the data that informs the training of ML models. And training ML models requires a significant amount of data, more than a single individual or&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/793088","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/40735"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=793088"}],"version-history":[{"count":34,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/793088\/revisions"}],"predecessor-version":[{"id":878505,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/793088\/revisions\/878505"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/793244"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=793088"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=793088"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=793088"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=793088"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=793088"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=793088"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=793088"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=793088"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=793088"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=793088"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=793088"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}