I am a Principal Researcher at Microsoft Research AI in the information and data sciences group.

My current research focuses on causal analysis of large-scale social media timelines, with the vision of making causal question-answering as fast and as common as web search. With hundreds of millions of people publicly reporting on their daily experiences, we can data mine these social media streams to better understand the common and critical situations people are in, the actions they take, and their implications.  These inferences are useful for many applications including decision support tools for individuals and analytics to support policy-makers and scientists.

More broadly, I am interested in using social data to help people find what they want and need; and to this end my work drives towards the goal of extracting from social media useful models of how people behave in the world — people's actions in the world, people's interactions with each other, and the consequences of people's decisions. There are three questions I'm addressing with my research:

Foundations and infrastructure for better social media analysis: I build tools and frameworks to make it easier and faster for people to deeply analyze and explore social media. This includes developing best practices for analysis and making them easy to follow.
Connecting social media to the real-world: To interpret social media, I work on entity linking and study the systematic biases inherent in social media's reflection of the world.
Social systems engineering: I study how the affordances and incentives provided by social systems affect the kinds of information we find in social media.

My previous research interests include JavaScript application monitoring and optimization, as well as improving the reliability of Internet services architectures and operations. I received my Ph.D. and my M.S. from Stanford University, and my B.S. in Electrical Engineering and Computer Science from U.C. Berkeley.


DSoAP – Distributed Social Analytics Platform

Established: June 1, 2015

The Distributed Social Analytics Platform (DSoAP) project is focused on the “Huge Data” problem in social policy research caused by the breadth of data involved. Using aggregate social media data to investigate and validate social issues (such as employment, health and fiscal policy) requires analyzing many months or years of data. DSoAP is applying intelligent compaction, pre-indexing and distribution of data across a server cluster to achieve responsive query times for online data exploration. Twitter…

Discussion Graph Tool

Established: April 25, 2014

Discussion Graph Tool (DGT) is an easy-to-use analysis tool that provides a domain-specific language extracting co-occurrence relationships from social media and automates the tasks of tracking the context of relationships and other best practices. DGT provides a single-machine implementation, and also generates map-reduce-like programs for distributed, scalable analyses. DGT simplifies social media analysis by making it easy to extract high-level features and co-occurrence relationships from raw data. With just 3-4 simple lines of script, you…

Online and Social Media Data as a Flawed Continuous Panel Survey

Established: April 9, 2014

If search and Twitter data are to be treated as a survey, they would follow a very peculiar methodology: participation is a time-varying, demographically biased sample of the population, participants are effectively continuously answering different “survey” questions, and, finally, participants can choose how often they are allowed to answer the question. In response, we show alternative methods for interpreting and using online and social media data fruitfully. There is a large body of research on…


Established: February 7, 2008

Doloto stands for Download Time Optimizer and is also the Russian word for chisel.

Ajax View

Established: April 30, 2007

Ajax View enables developer to see and control the behaviors of their web applications on user's desktops.     News April 29, 2009: The technology in Ajax View is now available as a Power Tool: Microsoft Visual Studio AJAX Profiling Extensions. This power tool includes a server-side extension to IIS to add profiling code to your JavaScript web applications, and a Visual Studio add-in to investigate this data with Visual Studio's Performance Explorer.




















Link description

Social Computing


May 22, 2014


Barbara Poblete, Emre Kiciman, and Fernando Diaz


University of Chile, Microsoft


Longitudinal Tweet ID dataset for a selection of Health, Social, and Business Experiences

April 2017

This data set consists of the tweet IDs collected for the propensity-score analysis of longitudinal social media messages posted by people who mention specific health, social and business domains. This data set accompanies the paper, “Distilling the Outcomes of Personal Experiences: A Propensity-scored Analysis of Social Media.”

    Click the icon to access this download

  • Website

Election 2012 Tweet ID dataset

January 2016

    Click the icon to access this download

  • Website

Discussion Graph Tool

June 2014

    Click the icon to access this download

  • Website

Social Web Experience

September 2009

    Click the icon to access this download

  • Website

Ajax View JavaScript Instrumentation Proxy

July 2007

    Click the icon to access this download

  • Website