Vuvuzelas & Active Learning for Online Classification
- Ulrich Paquet ,
- Jurgen Van Gael ,
- David Stern ,
- Gjergji Kasneci ,
- Ralf Herbrich ,
- Thore Graepel
Computational Social Science and the Wisdom of Crowds Workshop (colocated with NIPS 2010) |
Many online service systems leverage user-generated content from Web 2.0 style platforms such asWikipedia, Twitter, Facebook, and many more. Often, the value lies in the freshness of this information (e.g. tweets, event-based articles, blog posts, etc.). This freshness poses a challenge for supervised learning models as they frequently have to deal with previously unseen features.
In this paper we address the problem of online classification for tweets, namely, how can a classifier be updated in an online manner, so that it can correctly classify the latest “hype” on Twitter? We propose a two-step strategy to solve this problem. The first step follows an active learning strategy that enables the selection of tweets for which a label would be most useful; the selected tweet is then forwarded to Amazon Mechanical Turk where it is labeled by multiple users. The second step builds on a Bayesian corroboration model that aggregates the noisy labels provided by the users by taking their reliabilities into account.