Neighbourhood Component Analysis

Date

May 17, 2004

Speaker

Sam Roweis

Affiliation

University of Toronto

Overview

Say I give you a dataset of N points {xn} in D dimensions.
Can you find for me (up to rotation and isotropic scaling) a projection matrix A (of size d by D) such that when you apply nearest neighbour classification to the point set {yn = A xn} you get the best possible performance? Nearest neighbour classification is a very simple nonparametric method for supervised learning, and has several appealing properties: the decision surfaces are nonlinear, the quality of the predictions automatically improve as the amount of training data increases, and there is only a single hyperparameter to be tuned.

However, there are two significant problems. First, we must define what we mean by “nearest”, in other words we must specify a metric on the input space. Second, the computational load of the classifier is quite high at test time since we must store and search through the entire training set to find the neighbours of a query point before we can do classification. In this talk I will present an algorithm for linearly reducing the dimensionality of the input data in a way that preserves the performance of nearest neighbour classification as much as possible in the low dimensional space.

The algorithm simultaneouly reduces the computational load of the classifier at test time and learns a (low-rank) metric on the input space.Results on a variety of datasets show that the performance of nearest neighbour after using our dimensionality reduction technique consistently exceeds the performance after applying other linear dimensionality reduction methods such as PCA or LDA.

Speakers

Sam Roweis

Sam Roweis is an Assistant Professor in the Department of Computer Science at the University of Toronto. His research interests are in machine learning, data mining, and statistical signal processing. Roweis did his undergraduate degree at the University of Toronto in the Engineering Science program and earned his doctoral degree in 1999 from the California Institute of Technology working with John Hopfield. He did a postdoc with Geoff Hinton and Zoubin Ghahramani at the Gatsby Unit in London, and has also worked in several industrial research labs including Bell Labs, and Whizbang! Labs. He is the holder of a Canada Research Chair in Statistical Machine Learning and the winner of a Premier’s Research Excellence Award.