IMS-Microsoft Research Workshop: Foundations of Data Science – Online, Opt-in Surveys: Fast, Cheap, and Mostly Accurate

Session Chair Into – David Dunson Duke University Session Chair Intro: Computational Social Science

David Rothschild Microsoft Research

Online, Opt-in Surveys: Fast, Cheap, and Mostly Accurate

We explore varying methods of survey data collection, and transforming raw survey data into answers. We reject the standard construct that survey data is either “probability” or “non-probability” and, consequently, accurate or non-accurate; all survey data collection is on a continuum that runs from the theoretically perfect probability sample (or a complete census) to an extremely biased opt-in sample, and all raw survey data is transformed into a set of answers with methodology that runs from simple traditional weighting to modern statistically derived methods. The closer survey data collection is to perfect probability the more expensive it is to collect (in time and/or money), but the less data necessary to reach an equivalent level of accuracy. We compare and contrast the results of four different surveys that ask a series of overlapping questions and whose data collection falls at different points of the continuum from extremely rigorous traditional probability to fully opt-in sample design. We show that properly transformed survey data from the opt-in sample has similar differences, on a series of general interest and public policy questions, to the traditional surveys relative to what the traditional surveys have between themselves. Fast and cheap data collection does not produce ground truth, but it does produce data that we can transform into a mostly accurate set of answers.

Speaker Details

David Rothschild is an economist at Microsoft Research New York City. He has a Ph.D. in applied economics from the Wharton School of Business at the University of Pennsylvania. His primary body of work is on forecasting and understanding public interest and sentiment. Related work examines how the public absorbs information. He has written extensively, in both the academic and popular press, on polling, prediction markets, and predictions of upcoming events; most of his popular work has focused on predicting elections and an economist’s take on public policy. After joining Microsoft Research in May, he has been busy building prediction and sentiment models, as well as organizing novel/experimental polling and prediction games. In February 2012, he correctly predicted 50 of 51 Electoral College outcomes in the U.S. presidential election the following November.

David Dunson and David Rothschild
Duke University, Microsoft