Much research has focused on studying complex phenomena through its reflection in social media, from drawing neighborhood boundaries to inferring relationships between medicines and diseases. While it is generally recognized in the social sciences that such studies should be conditioned on gender, time and other confounding factors, few of the studies that attempt to extract information from social media actually condition on such factors.
In this paper, we present a simple framework for specifying and implementing common social media analyses that makes it trivial to inspect and condition on such contextual information. Our data model, discussion graphs, capture both the structural features of relationships inferred from social media as well as the context of the discussions from which they are derived, such as who is participating in the discussions, when and where the discussions are occurring, and what else is being discussed in conjunction. We implement our framework in a tool called Q, and present case studies on its use. In particular, we show how analyses of neighborhoods and their boundaries based on geo-located social media data can have drastically varying results when conditioned on gender and time (day/night and weekend/weekday).