Geometry and Massive Data


January 20, 2012


Ravi Kannan


Microsoft Research India


Talk by Dr. Ravi Kannan at TechVista 2012, Kolkata, India.
Modeling each data record in a problem as a vector is an important tool in modern computing, striking recent examples being Web Search and Recommendation Systems. The vectors are in very high dimensions (1000’s or more). Basic concepts from Linear Algebra such as best-fit directions turn out to play an important role in Data Analysis. The number of data records can be enormous – often in the millions or more and Random Sampling is crucial to any algorithm dealing with such massive data. The sampling has to be done on the fly. The narration here will be elementary and is intended to provoke a broad study of the Mathematics and Algorithms rather than any particular application.