Vuvuzelas & Active Learning for Online Classification

  • Ulrich Paquet ,
  • Jurgen Van Gael ,
  • David Stern ,
  • Gjergji Kasneci ,
  • Ralf Herbrich ,
  • Thore Graepel

Computational Social Science and the Wisdom of Crowds Workshop (colocated with NIPS 2010) |

Many online service systems leverage user-generated content from Web 2.0 style platforms such asWikipedia, Twitter, Facebook, and many more. Often, the value lies in the freshness of this information (e.g. tweets, event-based articles, blog posts, etc.). This freshness poses a challenge for supervised learning models as they frequently have to deal with previously unseen features.
In this paper we address the problem of online classification for tweets, namely, how can a classifier be updated in an online manner, so that it can correctly classify the latest “hype” on Twitter? We propose a two-step strategy to solve this problem. The first step follows an active learning strategy that enables the selection of tweets for which a label would be most useful; the selected tweet is then forwarded to Amazon Mechanical Turk where it is labeled by multiple users. The second step builds on a Bayesian corroboration model that aggregates the noisy labels provided by the users by taking their reliabilities into account.