Government, policy, and the era of Big Data

28 August 2014 | Ted Malone, Big Data Architecture lead for Microsoft Federal

Big Data holds tremendous promise for society. But we won’t learn anything about it if it’s locked up in government or private silos. It has to be broadly available so it can be analyzed. Government agencies can play a role by crafting policies that balance availability with security.

What would that look like? Think about data that relates to people. To help protect sensitive personal information, government agencies should strengthen—but also adapt—privacy laws to enable the collection and use of large datasets while preserving privacy. This will entail tradeoffs on which reasonable people may have widely varying views. In determining how best to address privacy concerns, government agencies can draw upon a full range of tools—not only law, but also technology, articulated best practices, and technical standards.

From our perspective at Microsoft, here are the big-picture issues surrounding a government agency’s use of Big Data:

The promise of Big Data

Technological advances have led to the digitization of massive amounts of information across the private and public sectors. The costs of collecting, retaining, aggregating, and analyzing all this data have come down, which makes it easier and more affordable to detect patterns and improve the quality of prediction. Here are some real-world examples:

  • A larger dataset allowed Microsoft to improve a grammar checker in Microsoft Word, improving accuracy from 75 to 95 percent. You can read about this effort in a paper by Michele Banko and Eric Brill that discusses natural language disambiguation, presented to the Association for Computational Linguistics.
  • Microsoft teams worked with doctors who were researching HIV mutation by applying analytical methods originally developed for fighting email spam, whose mutation and adaptation patterns have been likened to human immune systems. You can read more in this article in Fast Company.
  • Bing search data yielded valuable clues that led Stanford University researchers to discover a potential drug interaction between two medications that could be dangerous when taken together by diabetes patients. Read the report discussing web-scale pharmacovigilance by Ryen White, Nicholas Tatonetti, and Nigam Shah.

The broad availability of Big Data

As both a source and steward of data—for fiscal, social, economic, environmental, and infrastructure programs—governments can play multiple roles to ensure that society realizes the benefits of Big Data while helping protect other important values. The challenge for government agencies is to create policies that balance access with appropriate limits to allow data to support societal goals. Governments should employ approaches that minimize privacy risks to individuals while honoring their duty to “promote the progress of science and useful arts” (the constitutional basis for copyright law) when aggregating Big Data.

Strengthening and adapting privacy regulation

Some large datasets will relate to people, which introduces a risk of data misuse. I believe that the promise of Big Data will not be realized unless approaches are established to address privacy and civil liberties concerns. Privacy regulation in the U.S. is complex, and is showing signs of strain under the weight of Big Data. The notice and consent paradigm, and privacy regulation as a whole, should be strengthened and adapted to a Big Data world. This position is based on two problems:

  1. Individuals bear the burden of privacy protection. Consumers confronted with privacy statements from vendors tend to “agree” with legal terms they haven’t read. Laws are satisfied, but this is weak privacy protection.
  2. Even as the existing paradigm may fail to provide real protection, it may preclude beneficial uses of data that would present little privacy risk.

Government agencies should look at ways to focus the use of notice and consent in those areas where decisions really can be informed and meaningful, and where privacy concerns are significant.

Protecting privacy in a world of Big Data

The challenge of unlocking the value of Big Data while helping protect the privacy of those whose data is included requires a multifaceted solution. As I stated in yesterday’s post, government should look to technology, best practices, law, and standards to help address this.

Why should government care about big data? Because it’s everywhere. The more active role a government takes in crafting policies that utilize, protect, and respect this wealth of digital information, the more we can put Big Data to work for societal good.

Have a comment or opinion on this post? Let me know @Microsoft_Gov. Have a question for the author? Please e-mail us at

Ted Malone
Big Data Architecture lead for Microsoft Federal