Combinatorial Approach to Data Mining

  • Yury Lifshits | Caltech

In many algorithmic problems like nearest neighbor search, clustering, near-duplicate detection or decentralized navigation input dataset is described either by some explicit representation (e.g. vectors) or by oracle access to pairwise distances.

In this talk we assume that our knowledge about dataset is much more limited. All informaton provided to an algorithm has the form “A is more similar to C than B is”. Thus, we have only “comparative” information. It turns out that with some reasonable “consistancy assumption” all mentioned problems can be solved efficiently without any access to actual distance values. We also show a connection between combinatorial framework and concepts of doubling dimension and Karger-Ruhl dimension.

Speaker Details

Yury Lifshits obtained his PhD degree from Steklov Institute of Mathematics at St.Petersburg (2007). He is currently a postdoc at Caltech. He won two gold medals at International Mathematical Olympiads, received The Best Paper Award in Application Track of CSR’07 and The Yandex Teaching Award (2006) for his course “Algorithms for Internet”. Yury is a maintainer of “The Homepage of Nearest Neighbors”: http://simsearch.yury.nameRecently he published an overview of similarity search as a Youtube video lecture: http://www.youtube.com/watch?v=MsRTrO_p6yE

    • Portrait of Jeff Running

      Jeff Running