The “active learning” model is motivated by scenarios in which it is easy to amass vast quantities of unlabeled data (images and videos off the web, speech signals from microphone recordings, and so on) but costly to obtain their labels. Like supervised learning, the goal is ultimately to learn a classifier. But the labels of training points are hidden, and each of them can be revealed only at a cost. The idea is to query just a few labels that are especially informative about the decision boundary, and thereby to obtain an accurate classifier at significantly lower cost than regular supervised learning.
There are two distinct ways of conceptualizing active learning, which lead to rather different querying strategies. The first treats active learning as an efficient search through a hypothesis space of candidates, while the second has to do with exploiting cluster or neighborhood structure in data. This talk will show how each view leads to active learning algorithms that can be made efficient and practical, and have provable label complexity bounds that are in some cases exponentially lower than for supervised learning.