Privacy, Big Data and the role of government

27 August 2014 | Ted Malone, Big Data Architecture lead for Microsoft Federal

In a previous post I wrote about the challenges and opportunities facing governments in the era of Big Data. There is much to be said on this complex topic and Microsoft will continue to be engaged in all aspects of this evolving conversation. For the purposes of this post, I will focus on a single critical issue: privacy. To be more specific, I’d like to offer our perspective on a few ways we think government agencies can help protect the privacy of citizens who live and work in a Big Data world.

  • Technology. Data privacy has traditionally been concerned with the collection, use, and disclosure of data that identifies individuals. Privacy frameworks have generally divided data into one of two categories: data that does not identify individuals and data that does, with the assumption that most data is in one category or the other. In a world of Big Data, we will likely need to abandon this binary model.

    It may be best to treat identifiability of data on a continuum. Government agencies should promote research on defining and advancing pseudonymization and de-identification techniques, which would help to address privacy concerns while enabling Big Data analysis. A technique called “differential privacy” is also promising, as it relies on limiting access to underlying data rather than on removing or altering information from within a database. Finally, governments should also explore greater use of cryptographic technologies. De-identification often involves cryptographically hashing information that directly identifies individuals, such as account numbers.

    Many promising techniques for de-identification, encryption, and differential privacy are still only in the research stage and further work to commercialize them should be encouraged. By recognizing the potential for misuse of Big Data and the risks of various levels of de-identification, policymakers could help society quantify the value that these technologies offer. That, in turn, would likely encourage industry investments in those potentially useful technologies.

  • Best Practices. Technologies should be supplemented by best practices—in both the public and private sector—regarding the use of those technologies. Such best practices would help to guard against systemic attacks or other attempts to circumvent privacy protection. Best practices should be standardized and accompanied by robust audit mechanisms that focus on the organizational controls that organizations put in place. Technology has a role to play in standardizing and enforcing these policies, including by associating or “tagging” policy requirements directly to datasets.

  • Law. Legal requirements should also be in place to back up certain technological solutions. Uniform legal rules could help address the kinds of risks introduced when people do not know the identities and/or roles of the organizations that have access to their personal data. Any such rules should, of course, be carefully crafted to avoid locking in technologies now that may need to be changed quickly as knowledge of Big Data and its privacy implications progresses.

  • Standards. One way to provide technological flexibility is to craft law that relies upon industry technical standards. The Federal Information Security Management Act of 2002 (FISMA) is an example of this approach. To help implement FISMA, the National Institute of Standards and Technology (NIST) issued NIST Special Publication (SP) 800-53, which 1) defines a risk management process, 2) specifies the risks that stakeholders must consider, and 3) provides lists of effective mitigations. A standard for Big Data and privacy should seek to continuously improve how risks are addressed and to adapt to new risks, as FISMA and NIST SP 800-53 have done. Such a standard should describe a process of assessing risk, taking measures to reduce risk, assessing the effectiveness of the measures, and then returning to reassess risk. Finally, a successful standard should require any organization that follows it to document how it does so and to submit to third-party validation.

Privacy is everyone’s concern. But taking steps to help secure data to protect privacy can also be government’s concern. In tomorrow’s post, I’ll discuss how government policy can balance data availability with data security, so the benefits of Big Data can be realized without jeopardizing privacy.

Have a comment or opinion on this post? Let me know @Microsoft_Gov. Have a question for the author? Please e-mail us at

Ted Malone
Big Data Architecture lead for Microsoft Federal