Abstract

As users turn to large scale social media systems like Twitter for topic-based content exploration, they quickly face the issue that there may be hundreds of thousands of items matching any given topic they might query. Given the scale of the potential result sets, how does one identify the “best” or “right” set of items? We explore a solution that aligns characteristics of the information space, including specific content attributes and the information diversity of the results set, with measurements of human information processing, including engagement and recognition memory. Using Twitter as a test bed, we propose a greedy iterative clustering technique for selecting a set of items on a given topic that matches a specified level of diversity. In a user study, we show that our proposed method yields sets of items that were, on balance, more engaging, better remembered, and rated as more interesting and informative compared to baseline techniques. Additionally, diversity indeed seemed to be important to participants in the study in the consumption of content. However as a rather surprising result, we also observe that content was perceived to be more relevant when it was highly homogeneous or highly heterogeneous. In this light, implications for the selection and evaluation of topic-centric item sets in social media contexts are discussed.