Exploiting Sparsity in Unsupervised Classification

  • Sina Jafarpour | Princeton University

We talk about two unsupervised classification projects in which the provided data have a low dimensional structure. We show how this sparsity information can be used to increase the classification accuracy.

First, we describe how to use the Internet Movie Database cast-list to automatically detect and classify the actors appearing in movies such as sleepless in Seattle. We present the way we automatically detect and cluster the faces in real-time, and we show an efficient way to classify the faces in a new movie by exploiting the recent results on dictionary learning and compressed sensing.

We then consider a more generic system with millions of Question and Answer pairs and use two semantic analysis methods (linked-LDA, and linked-LSA) to capture the relationship between questions and constituent words. We express improvements in performance using the recall and the F-measure. We conclude with a discussion of challenges for future work.

These results were obtained at AT&T in collaboration with Howard Karloff, Patrick Haffner, Srinivas Bangalore, Taniya Mishra, and Carlos Scheidegger.

Speaker Details

Sina Jafarpour is a second year PhD student at the Princeton University working with Prof. Rob Schapire and Prof. Rob Calderbank. His main interests is using sparsity in real applications including image authentication, deterministic compressed sensing, and large-scale text models. This summer, he is working with Chris Burges and the TMSN group.

    • Portrait of Jeff Running

      Jeff Running