Cluster Trees and Neighborhood Graphs


August 7, 2015


Sanjoy Dasgupta




What information does the clustering of a finite data set reveal about the underlying distribution from which the data were sampled? This basic question has proved elusive even for the most widely-used clustering procedures. One natural criterion is to seek clusters that converge (as the data set grows) to regions of high density. When all possible density levels are considered, this is a hierarchical clustering problem where the sought limit is called the “cluster tree”.

This talk will describe two simple algorithms for estimating this tree that implicitly construct a multiscale hierarchy of near-neighbor graphs on the data points. We’ll show that these procedure are consistent, and give rates of convergence using a percolation argument that also gives insight into how neighborhood graphs should be constructed.


Sanjoy Dasgupta

“Sanjoy Dasgupta obtained his undergraduate degree from Harvard College in 1993. He worked for a year at Bell Laboratories and since then has been a graduate student at U.C. Berkeley, under the supervision of Umesh Vazirani. His thesis work, which will be completed this December, is motivated by the need for efficient and provably good learning algorithms for various commonly-used families of probability distributions.”