Human activity is one of the most important pieces of context affecting an individual’s information needs. Understanding the relationship between activities, time, location, and other contextual features can improve the quality of various intelligent systems, including contextual search engines, task managers, digital personal assistants, chat bots, and recommender systems.
In this work, we propose a method for extraction of an extensive set of open-vocabulary activities from social media. From Twitter, where people share information about their past, present, and future events, we derive tens of thousands of ongoing activities that are happening at the time of posting and, using attached metadata, we establish spatiotemporal models of these activities. While public Twitter content is subject to self-censorship (not all activities are tweeted about), we compare extracted data with unbiased survey data (ATUS) and show evidence that for activities which are tweeted about, the underlying spatiotemporal profiles correctly capture their real distributions of activity conditioned on time and location. Next, to better understand the set of activities present in this dataset (and what role self-censorship may play), we perform a qualitative analysis to understand the activities, locations, and their temporal properties. Finally, we go on to solve predictive tasks centered on the relationship between activity and spatiotemporal context that are aimed at supporting an individual’s information needs. Our predictive models, which incorporate text, personal history and temporal features, show a significant performance gain over a strong frequency-based baseline.