In this talk, I will set forth a general approach to many of the major problems in Machine Learning, including classification, regression and clustering, based on ideas from spectral graph theory. Applying this approach will yield a simple and clean methodology that gives very good solutions to these problems, outperforming the state-of-the-art on many of the standard data sets.
In recent years, a number of researchers have gained insight by fitting graphs to their data and then using these graphs to solve various problems. They have employed simply defined graphs that are easy to compute, associating a vertex of the graph with each data vector, and then connecting vertices whose vectors are sufficiently close, sometimes with weights depending on the distance. Not surprisingly, different results are obtained by the use of different graphs, and researchers have studied how to combine different graphs in a way that tends to give heavier weight to the better graphs.
I will examine what can be gained by choosing the graphs with more care. To this end, I will pose the question,
“What is the right graph to fit to a set of vectors in Rn?”
I will then propose an answer based on the graph Laplacian and a discrete analogue of the harmonic energy of a manifold, and I will discuss how to surmount the algorithmic challenges that arise in computing the optimal graphs according to this measure.
These graphs have several interesting theoretical properties and relate to a variety of other concepts in the machine learning literature, including manifold learning and nonlinear dimensionality reduction. In addition, I will show how they can be viewed as a generalization of both k-means clustering and singular value decompositions.
Joint work with Daniel Spielman and Samuel Daitch.