Simple practical methods for estimating distances in large and sparse datasets like the web
- Ping Li | Stanford
Random projections are a popular method for estimating distances. Instead of computing L2 distances over a big data matrix, we multiply the big matrix with a very sparse random matrix (over [-1 0 1]) to produce a smaller matrix. Distances in the smaller matrix are close to distances in the big matrix, but the smaller matrix is easier to work with. We then generalize very sparse random projections by introducing a novel data structure for estimating L1 and Lp distances as opposed to L2 distances only. Finally, we end with a new method, conditional random sampling. The new method estimates a variety of distance functions (L1, L2, Lp, Chi-square and more) from a single data structure. A single data structure is more convenient than different data structures for different distance functions.
Speaker Details
Ping Li is an assistant professor at Cornell. His Ph.D. is from Stanford in Statistics (2007). Ping has perhaps the record for the most Microsoft Internships: one as a dev in Visual Studio, followed by three in MSR (two with Ken Church and one with Chris Burges). During the most recent internship, he popularized the use of MART, a method that was subsequently used by the MSRAD team in their top performing entry in the Ad Predict Contest, as well as a number of other search applications. Ping Li is among the 15 recipients of the ONR young investigator award in 2009.
-
-
Jeff Running
-
Watch Next
-
-
-
Accelerating MRI image reconstruction with Tyger
- Karen Easterbrook,
- Ilyana Rosenberg
-
-
-
-
From Microfarms to the Moon: A Teen Innovator’s Journey in Robotics
- Pranav Kumar Redlapalli
-
-
-