||Keynote: Matching Methods and Natural Experiments: Examples of Causal Inferences from Social Media. Ingmar Weber, QCRI
Invited Talk. Estimating causal effects at scale. Amit Sharma (Microsoft Research)
||Lunch (on your own)
||Tutorial: Intuition and basic methods for causal inference over social media data. Emre Kıcıman
- Estimating the Effect of Exercising on Users’ Online Behavior. Amin Mirlohi, Hawre Hosseini, Zeinab Noorian and Ebrahim Bagheri
- Qualitative Exploration and Early Discovery of Prior-Information for Causal Inference Pablo Paredes
||Keynote: Ruffled Feathers: When Can Gender Be Predicted on Social Networks? Johan Ugander, Stanford
Matching Methods and Natural Experiments: Examples of Causal Inferences from Social Media
Ingmar Weber (QCRI)
Abstract: This talk will feature a mix of own and others’ work, and will be part tutorial and part experience report on how to use two causal inference methodologies: natural experiments and matching methods. On the topic of natural experiments, I’ll present three bits of work that use weather and its randomness as an instrumental variable to study social influence in social networks. First, we’ll look how emotional contagion on Facebook can be studied in an non-intrusive matter [Coviello et al., PLoS ONE, 2014]. Next, we’ll discuss recent work that looks at social contagion in a large physical fitness social network [Aral & Nicolaides, Nature Communications, 2017]. Finally, I’ll present ongoing work in which we use weather to study the social contagion of ice-cream consumption [Mejova et al., ongoing]. For matching methods, we first present work that uses propensity score matching to look at how feedback on news comments affects the commenter’s future behavior in the community [Cheng et al., ICWSM, 2014]. Next I’ll present work that studies the effect of receiving social support in a weight loss forum in Reddit on the actual weight loss reported in the end [Cunha et al., WWW-WebSci, 2017]. The presentations of all papers, including our own, will in particular focus on particular shortcomings, hopefully leading to a lively discussion on both the promises and limitations of causal inference from observational data.
About Ingmar Weber: Ingmar is the Research Director of the Social Computing Group at the Qatar Computing Research Institute (QCRI). His interdisciplinary research uses large amounts of online data from social media and other sources to study human behavior at scale. Particular topics of interest include studying lifestyle diseases and population health, quantifying international migration using digital methods, and looking at political polarization and extremism. He has published over 100 peer-reviewed articles and his work is frequently featured in popular press. Since 2016 he has been selected as an ACM Distinguished Speaker.
As an undergraduate Ingmar studied mathematics at Cambridge University (1999-2003), before pursuing a PhD at the Max-Planck Institute for Computer Science (2003-2007). He subsequently held positions at the Ecole Polytechnique Federale de Lausanne (2007-2009) and Yahoo Research Barcelona (2009-2012), as well as a visiting researcher position at Microsoft Research Cambridge (summer 2008). He serves on a number of program committees for top-tier conferences in the domain of web data mining and social media analysis including ICWSM, KDD, WWW, ACL, ASONAM, WSDM, SDM, VLDB and WebSci, as well as on the editorial board for the Journal of Web Science.
Ruffled Feathers: When Can Gender Be Predicted on Social Networks?
Johan Ugander (Stanford University)
Abstract: Homophily, or “love of the same,” is a prominent and well-studied structural feature of social networks. Modern machine learning methods have exploited the empirical ubiquity of homophily to predict the private traits of individuals in social networks, with broad implications for both privacy research and statistical matching methods. However, gender inference on large-scale online networks introduces a unique challenge to conventional prediction methods as gender homophily can be weak or nonexistent. In this work we identify another useful structural property we call monophily that is characterized by an overdispersion of gender preferences beyond what can be explained by homophily. We jointly characterize the statistical structure of homophily and monophily in social networks in terms of preference bias and variance, and demonstrate that monophily can drive surprisingly accurate predictions in graphs where weak homophily might otherwise suggest a difficult classification problem. These findings offer an alternative perspective on network trait inference in general and gender in particular, complicating the already difficult task of protecting privacy in social networks, and introducing new considerations regarding any study of social network covariates. This is joint work with Kristen M. Altenburger.
About Johan Ugander: Johan Ugander is an Assistant Professor at Stanford University in the Department of Management Science & Engineering, within the School of Engineering. His research develops algorithmic and statistical frameworks for analyzing social networks, social systems, and other large-scale social data, with an emphasis on methods for causal inference. Prior to joining the Stanford faculty he was a post-doctoral researcher at Microsoft Research Redmond 2014-2015 and held an affiliation with the Facebook Data Science team 2010-2014. He obtained his Ph.D. in Applied Mathematics from Cornell University in 2014.
Estimating causal effects at scale
Amit Sharma (Microsoft Research)
Abstract: As an increasing amount of daily activity—ranging from what we purchase to whom we talk to—shifts to online platforms, it is only natural to ask how those platforms impact our behavior. Estimating the causal impact of digital systems, however, is non-trivial without the luxury of doing randomized experiments. In this talk, I will present methods for estimating the impact of systems from observed usage data, combining principles from statistical data mining and causal inference literature. The first method provides an algorithm for automating the search for natural experiments, whenever we can find a suitable auxiliary control variable. Using it on Amazon’s recommender system, I find that the causal impact of recommendations is less than half of the metrics typically reported for online systems. The second method utilizes the scale of online systems to compute causal effect using extreme events. I will show its application to Bing, where it was used to investigate causal differences in user satisfaction for people from different demographics.
About Amit Sharma: Amit Sharma is a postdoctoral researcher at Microsoft Research, New York. His research focuses on understanding the underlying mechanisms that shape people’s activities online, with a particular emphasis on the effect of recommendation systems and social influence. More generally, his work contributes to methods for causal inference from large-scale data. He completed his Ph.D. in computer science at Cornell University. He has received a Best Paper Honorable Mention Award at the 2016 ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW), the 2012 Yahoo! Key Scientific Challenges Award and the 2009 Honda Young Engineer and Scientist Award.
Estimating the Effect of Exercising on Users’ Online Behavior
Amin Mirlohi, Hawre Hosseini, Zeinab Noorian and Ebrahim Bagheri
Abstract: This study aims to estimate the influence of offline activity on users’ online behavior, relying on a matching method to reduce the effect of confounding variables. We analyze activities of 850 users who are active on both Twitter and Foursquare social networks. Users’ offline activity is extracted from Foursquare posts and users’ online behavior is extracted from Twitter posts. Users’ interests, representing their online behavior, are extracted with regards to a set of topics in several subsequent time intervals. The shift of users’ interests across different time intervals is taken as a measure of user behavior change on the social network. On the other hand, we employ user check-ins at a gym or fitness center as a sign of exercise and consider it to be an offline activity. In order to find the effect of exercise on online behavior, we identify users who did not go to the gym for at least two months but did so at least nine times in the next three months. We show that shift in interest reduces significantly for users after they start exercising, which implies that the offline activity of exercising can influence how users’ interests are shaped and change on the social network over time.
Qualitative Exploration and Early Discovery of Prior-Information for Causal Inference
Pablo Paredes, Philipp Dowling, Vasilis Oikonomou, Biye Jiang, Coye Cheshire, John Canny and James Landay
Abstract: In this paper, we propose a set of qualitative temporal exploration features for Inquire [Paredes, et al.] to support observational studies and causal inference. Inquire is a system that supports qualitative or mixed-methods researchers across different fields in their insight discovery and ideation processes. Inquire enables its users to access large-scale corpora of text, and perform sophisticated searches and exploration on this data through semantic vector space models that allow rich exploratory methods far beyond simple text search. When applied to the LiveJournal.com dataset, which consists of millions of anonymous journal entries, our system can help researchers compile data sets of users and their posts that capture specific semantic criteria. For instance, a researcher can create a data set of people who write about both “being stressed” and “feeling lonely,” to attempt to subsequently establish a temporal relationship between stress and loneliness (or vice-versa). Inquire enables the researcher to perform interactive qualitative temporal analysis of these users can lead to the discovery of case-studies. Furthermore, individual or aggregated analysis of these cases could result in the discovery of less-obvious covariates. For example, a group of people under “stress” who feel “lonely” could potentially also have expressed in their journal entries some elements associated with “burnout” or “family conflicts.” In summary, we propose a tool for exploration of social-media data to expand the discovery of counterfactuals and covariates that would strengthen the theoretical and empirical informational space needed to develop robust causal inference studies.