Microsoft Research Blog

The Microsoft Research blog shares stories of collaborations with computer scientists at academic and scientific institutions to advance technical innovations in computing, as well as related events, scholarships, and fellowships.

Exploring the Biases of Big Data

April 2, 2013 | Posted by Microsoft Research Blog

Posted by Rob Knies

Kate Crawford

On Feb. 28, at the Santa Clara (Calif.) Convention Center, Kate Crawford, principal researcher at Microsoft Research New England, took the stage during the Strata Conference to deliver an illuminating, 17-minute talk entitled Algorithmic Illusions: Hidden Biases of Big Data.

During that presentation, she cautioned that data and collections of data are not objective. They are created and shaped by human beings, and understanding the unavoidable hidden biases people bring to data collection and analysis can be as significant as the data themselves.

Now, on the heels of that appearance, Crawford is bringing a similar message to a different audience, that of the Harvard Business Review, which has just published her contributed article,

“It couldn’t give us insight about the experiences in areas where people were cut off from telecommunications and power—or simply not using Twitter,” adds Crawford, also a visiting professor at the MIT Center for Civic Media.

“Then, in mid-2011, danah boyd and I co-authored a paper for the Oxford Internet Institute’s conference that articulated some of our concerns about big data and social media. There was very little around at the time that was asking critical questions of how big data was being used.“

The specific example of the Queensland floods would seem to point to the Twitter platform as being the culprit, but that’s not necessarily the case.

“Social-media data is one small slice of all the data that is out there,” Crawford says. “The same can be said for sensor data. But these are just examples of a bigger problem: Data sets from any source will have gaps and problems. There is no such thing as a data set that is untouched by human design: We decide what counts as data and what does not. Or, as Lisa Gitelman [media historian at New York University] has described, ‘Data need to be imagined as data to exist.’

“Big data is still subjective. It is still informed by disciplinary perspectives and the ever-changing histories of knowledge. Regardless of where the data come from, it’s useful to ask about the grounding assumptions, the methods, and the possible errors.”

Crawford offers a novel mechanism for enhancing the value of big data.

“Multidimentional data—data with depth, as I call it—can come from using mixed research methodologies: combining big-data analytics with small data studies that bring out the depth, nuance, and context that big data often misses. Small data can also produce rich insights and different perspectives that are left out or are unreachable by big-data studies.

“But above all, social-science approaches help us to ask productive questions about data to prevent us from falling victim to our own cognitive biases that often suggest answers we expect or lead us to results we wish to find.”