OSSM17: Observational Studies Through Social Media

Program

11:00am-12:30pm Keynote: Matching Methods and Natural Experiments: Examples of Causal Inferences from Social Media.  Ingmar Weber, QCRI

Invited Talk.  Estimating causal effects at scale. Amit Sharma (Microsoft Research)

12:30-2:00pm Lunch (on your own)
2:00-3:30pm Tutorial: Intuition and basic methods for causal inference over social media dataEmre Kıcıman

Talks

  • Estimating the Effect of Exercising on Users’ Online Behavior. Amin Mirlohi, Hawre Hosseini, Zeinab Noorian and Ebrahim Bagheri
  • Qualitative Exploration and Early Discovery of Prior-Information for Causal Inference Pablo Paredes
3:30-4:00pm Afternoon break
4:00-5:00pm Keynote: Ruffled Feathers: When Can Gender Be Predicted on Social Networks? Johan Ugander, Stanford

 

Details

Matching Methods and Natural Experiments: Examples of Causal Inferences from Social Media
Ingmar Weber (QCRI)

Abstract: This talk will feature a mix of own and others’ work, and will be part tutorial and part experience report on how to use two causal inference methodologies: natural experiments and matching methods. On the topic of natural experiments, I’ll present three bits of work that use weather and its randomness as an instrumental variable to study social influence in social networks. First, we’ll look how emotional contagion on Facebook can be studied in an non-intrusive matter [Coviello et al., PLoS ONE, 2014]. Next, we’ll discuss recent work that looks at social contagion in a large physical fitness social network [Aral & Nicolaides, Nature Communications, 2017]. Finally, I’ll present ongoing work in which we use weather to study the social contagion of ice-cream consumption [Mejova et al., ongoing]. For matching methods, we first present work that uses propensity score matching to look at how feedback on news comments affects the commenter’s future behavior in the community [Cheng et al., ICWSM, 2014]. Next I’ll present work that studies the effect of receiving social support in a weight loss forum in Reddit on the actual weight loss reported in the end [Cunha et al., WWW-WebSci, 2017]. The presentations of all papers, including our own, will in particular focus on particular shortcomings, hopefully leading to a lively discussion on both the promises and limitations of causal inference from observational data.

About Ingmar Weber:  Ingmar is the Research Director of the Social Computing Group at the Qatar Computing Research Institute (QCRI). His interdisciplinary research uses large amounts of online data from social media and other sources to study human behavior at scale. Particular topics of interest include studying lifestyle diseases and population health, quantifying international migration using digital methods, and looking at political polarization and extremism. He has published over 100 peer-reviewed articles and his work is frequently featured in popular press. Since 2016 he has been selected as an ACM Distinguished Speaker.

As an undergraduate Ingmar studied mathematics at Cambridge University (1999-2003), before pursuing a PhD at the Max-Planck Institute for Computer Science (2003-2007). He subsequently held positions at the Ecole Polytechnique Federale de Lausanne (2007-2009) and Yahoo Research Barcelona (2009-2012), as well as a visiting researcher position at Microsoft Research Cambridge (summer 2008). He serves on a number of program committees for top-tier conferences in the domain of web data mining and social media analysis including ICWSM, KDD, WWW, ACL, ASONAM, WSDM, SDM, VLDB and WebSci, as well as on the editorial board for the Journal of Web Science.


Ruffled Feathers: When Can Gender Be Predicted on Social Networks?
Johan Ugander (Stanford University)

Abstract: Homophily, or “love of the same,” is a prominent and well-studied structural feature of social networks. Modern machine learning methods have exploited the empirical ubiquity of homophily to predict the private traits of individuals in social networks, with broad implications for both privacy research and statistical matching methods. However, gender inference on large-scale online networks introduces a unique challenge to conventional prediction methods as gender homophily can be weak or nonexistent. In this work we identify another useful structural property we call monophily that is characterized by an overdispersion of gender preferences beyond what can be explained by homophily. We jointly characterize the statistical structure of homophily and monophily in social networks in terms of preference bias and variance, and demonstrate that monophily can drive surprisingly accurate predictions in graphs where weak homophily might otherwise suggest a difficult classification problem. These findings offer an alternative perspective on network trait inference in general and gender in particular, complicating the already difficult task of protecting privacy in social networks, and introducing new considerations regarding any study of social network covariates. This is joint work with Kristen M. Altenburger.

About Johan Ugander: Johan Ugander is an Assistant Professor at Stanford University in the Department of Management Science & Engineering, within the School of Engineering. His research develops algorithmic and statistical frameworks for analyzing social networks, social systems, and other large-scale social data, with an emphasis on methods for causal inference. Prior to joining the Stanford faculty he was a post-doctoral researcher at Microsoft Research Redmond 2014-2015 and held an affiliation with the Facebook Data Science team 2010-2014. He obtained his Ph.D. in Applied Mathematics from Cornell University in 2014.


Estimating causal effects at scale
Amit Sharma (Microsoft Research)

Abstract: As an increasing amount of daily activity—ranging from what we purchase to whom we talk to—shifts to online platforms, it is only natural to ask how those platforms impact our behavior. Estimating the causal impact of digital systems, however, is non-trivial without the luxury of doing randomized experiments. In this talk, I will present methods for estimating the impact of systems from observed usage data, combining principles from statistical data mining and causal inference literature. The first method provides an algorithm for automating the search for natural experiments, whenever we can find a suitable auxiliary control variable. Using it on Amazon’s recommender system, I find that the causal impact of recommendations is less than half of the metrics typically reported for online systems. The second method utilizes the scale of online systems to compute causal effect using extreme events. I will show its application to Bing, where it was used to investigate causal differences in user satisfaction for people from different demographics.

About Amit Sharma: Amit Sharma is a postdoctoral researcher at Microsoft Research, New York. His research focuses on understanding the underlying mechanisms that shape people’s activities online, with a particular emphasis on the effect of recommendation systems and social influence. More generally, his work contributes to methods for causal inference from large-scale data. He completed his Ph.D. in computer science at Cornell University. He has received a Best Paper Honorable Mention Award at the 2016 ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW), the 2012 Yahoo! Key Scientific Challenges Award and the 2009 Honda Young Engineer and Scientist Award.


Estimating the Effect of Exercising on Users’ Online Behavior
Amin Mirlohi, Hawre Hosseini, Zeinab Noorian and Ebrahim Bagheri

Abstract: This study aims to estimate the influence of offline activity on users’ online behavior, relying on a matching method to reduce the effect of confounding variables. We analyze activities of 850 users who are active on both Twitter and Foursquare social networks. Users’ offline activity is extracted from Foursquare posts and users’ online behavior is extracted from Twitter posts. Users’ interests, representing their online behavior, are extracted with regards to a set of topics in several subsequent time intervals. The shift of users’ interests across different time intervals is taken as a measure of user behavior change on the social network. On the other hand, we employ user check-ins at a gym or fitness center as a sign of exercise and consider it to be an offline activity. In order to find the effect of exercise on online behavior, we identify users who did not go to the gym for at least two months but did so at least nine times in the next three months. We show that shift in interest reduces significantly for users after they start exercising, which implies that the offline activity of exercising can influence how users’ interests are shaped and change on the social network over time.


Qualitative Exploration and Early Discovery of Prior-Information for Causal Inference
Pablo Paredes, Philipp Dowling, Vasilis Oikonomou, Biye Jiang, Coye Cheshire, John Canny and James Landay

Abstract: In this paper, we propose a set of qualitative temporal exploration features for Inquire [Paredes, et al.] to support observational studies and causal inference. Inquire is a system that supports qualitative or mixed-methods researchers across different fields in their insight discovery and ideation processes. Inquire enables its users to access large-scale corpora of text, and perform sophisticated searches and exploration on this data through semantic vector space models that allow rich exploratory methods far beyond simple text search. When applied to the LiveJournal.com dataset, which consists of millions of anonymous journal entries, our system can help researchers compile data sets of users and their posts that capture specific semantic criteria. For instance, a researcher can create a data set of people who write about both “being stressed” and “feeling lonely,” to attempt to subsequently establish a temporal relationship between stress and loneliness (or vice-versa). Inquire enables the researcher to perform interactive qualitative temporal analysis of these users can lead to the discovery of case-studies. Furthermore, individual or aggregated analysis of these cases could result in the discovery of less-obvious covariates. For example, a group of people under “stress” who feel “lonely” could potentially also have expressed in their journal entries some elements associated with “burnout” or “family conflicts.” In summary, we propose a tool for exploration of social-media data to expand the discovery of counterfactuals and covariates that would strengthen the theoretical and empirical informational space needed to develop robust causal inference studies.

Description

News! We are happy to announce our two keynote speakers, Johan Ugander and Ingmar Weber!

The OSSM workshop will focus on observational studies (namely, causal analyses) of data traces created by people, whether directly or indirectly, through their interactions with social networks, messaging services, and other applications and devices.

Human generated content in general, including social media, has been shown to be a rich repository of data for observational studies across many areas:  public health, with research on prevalence of disease and on the effects of media on the development of disease; medicine, showing the ability to detect mental disease in individual using social media; education, to optimize teaching and exams; and sociology, to prove theories previously tested on very small populations. These studies have been conducted from data including social media, search engine logs, location traces, and other forms of human generated content.

While many past studies showed a correlation between variables of interest, some studies were able to show causal relationships through natural experiments and careful analyses. Natural experiments are empirical studies in which people are assigned to control or experimental conditions according to factors not under the control of investigators, where these factors resemble a random assignment. With Internet data, natural experiments occur frequently through policy changes, outside influences, and A/B testing. Our workshop will focus on all aspects of causal inference from human generated content, with studies that developed novel methods of identifying and using natural experiments or other methods for inferring causality.

Studies based on human generated content require investigators to preserve the privacy of users and abide by proper ethical codes. We will encourage discussion of how these goals may be attained in secondary analysis of data such as that used in the aforementioned studies.

We are soliciting participation both from domain-experts interested in specific applications of methods, as well as methods-experts.

Topics of interest broadly include:

  • Applications and domain-specific explorations
  • Interpreting user-generated data, including text, structured data, and temporal data.
  • Causal analyses in social media, for example, through conditioning analyses, instrumental variables or regression discontinuity analysis.
  • Identifying natural experiments and using them to understand causal inferences
  • Identifying and/or mitigating population, behavioral and other systematic biases in social media
  • Novel methods for preserving privacy in such experimentation
  • Ethical implications of such experiments
  • Qualitative, experimental and other evaluation and validation methods for observational study results

General Format

The core of the workshop will consist of structured discussion and breakouts on topics preselected by participants.  Broad discussion topics will be chosen based on initial submissions, and refined during a pre-workshop video chat open to all workshop participants several weeks prior to the workshop.  Potential outcomes of these breakouts will include a position paper, or potential research directions/ideas that could lead to interdisciplinary collaborations, particularly between computer and social scientists. We will support breakout groups with a facilitator as well as a scribe.

Submissions

Important dates

  • March 24, 2017: Deadline for abstract/paper submissions
  • April 14, 2017: Notification of acceptance
  • May 15, 2017: Workshop day, at ICWSM 2017, Montreal, Canada

Abstract details

Submission should be extended abstracts of up to 4 pages and will be published on the workshop webpage and optionally (depending on the authors’ choice) in the ICWSM/AAAI workshop proceedings.  Submissions will be evaluated by the program committee based on the quality of the work and its fit to the workshop themes.  We explicitly encourage the submission of preliminary work.

Authors whose papers are accepted to the workshop will be able to give a talk about their work (10min) and have the opportunity to participate in a poster session. In addition, the workshop will consist of structured discussion and breakout groups on topics pre-selected by participants. Potential outcomes of these discussions include a position paper, or potential research directions/ideas that could lead to interdisciplinary collaborations, particularly between computer and social scientists.

Authors are encouraged to use the AAAI-17 formatting guidelines.  The author kit is available at http://www.aaai.org/Publications/Templates/AuthorKit17.zip.

Organization

Organizing Committee

Please contact us if you have any questions.

Program Committee

  • Carlos Castilllo, Eurecat
  • Ingemar Cox, University College London
  • Aron Culotta, Illinois Institute of Technology
  • Luis Fernandez Luque, Qatar Computing Research Institute
  • Shawndra Hill, Microsoft Research
  • Brian Keegan, University of Colorado Boulder
  • Alexandra Olteanu, IBM Research
  • Daniel Romero, University of Michigan
  • Amit Sharma, Microsoft Research
  • Ingmar Weber, Qatar Computing Research Institute

Call for Contributions

Held at ICWSM 2017 (International AAAI Conference on Web and Social Media), Montreal, Canada, May 15, 2017

Workshop webpage: https://www.microsoft.com/en-us/research/event/ossm17/

CALL FOR CONTRIBUTIONS

The OSSM workshop will focus on observational studies (namely, causal analyses) of data traces created by people, whether directly or indirectly, through their interactions with social networks, messaging services, and other applications and devices.

Human generated content in general, including social media, has been shown to be a rich repository of data for observational studies across many areas: public health, with research on prevalence of disease and on the effects of media on the development of disease; medicine, showing the ability to detect mental disease in individual using social media; education, to optimize teaching and exams; and sociology, to prove theories previously tested on very small populations. These studies have been conducted from data including social media, search engine logs, location traces, and other forms of human generated content.

While many past studies showed a correlation between variables of interest, some studies were able to show causal relationships through natural experiments and careful analyses. Natural experiments are empirical studies in which people are assigned to control or experimental conditions according to factors not under the control of investigators, where these factors resemble a random assignment. With Internet data, natural experiments occur frequently through policy changes, outside influences, and A/B testing. Our workshop will focus on all aspects of causal inference from human generated content, with studies that developed novel methods of identifying and using natural experiments or other methods for inferring causality.

Studies based on human generated content require investigators to preserve the privacy of users and abide by proper ethical codes. We will encourage discussion of how these goals may be attained in secondary analysis of data such as that used in the aforementioned studies.

We are soliciting participation both from domain-experts interested in specific applications of methods, as well as methods-experts.

Topics of interest include, but are not limited to

  • Applications and domain-specific explorations
  • Interpreting user-generated data, including text, structured data, and temporal data
  • Causal analyses in social media, for example, through conditioning analyses, instrumental variables or regression discontinuity analysis
  • Identifying natural experiments and using them to understand causal inferences
  • Identifying and/or mitigating population, behavioral and other systematic biases in social media
  • Novel methods for preserving privacy in such experimentation
  • Ethical implications of such experiments
  • Qualitative, experimental and other evaluation and validation methods for observational study results

Submission should be extended abstracts of up to 4 pages and and will be published on the workshop webpage and optionally (depending on the author’s’ choice) in the ICWSM/AAAI workshop proceedings. We explicitly encourage the submission of preliminary work.

Authors whose papers are accepted to the workshop will be able to give a talk about their work (10min) and have the opportunity to participate in a poster session. In addition, the workshop will consist of structured discussion and breakout groups on topics pre-selected by participants. Potential outcomes of these discussions include a position paper, or potential research directions/ideas that could lead to interdisciplinary collaborations, particularly between computer and social scientists.

KEY DATES

  • Submission deadline: March 24, 2017
  • Author feedback: April 14, 2017
  • Workshop event (Montreal, Canada): May 15, 2017

Please see workshop webpage for formatting and submission instructions.

ORGANIZATION

Tim Althoff, Stanford University

Elad Yom-Tov, Microsoft Research

Munmun De Choudhury, Georgia Tech

Emre Kıcıman, Microsoft Research

CONTACT

Please direct your questions to ossm-workshop@googlegroups.com