Information Geometry


July 31, 2015


Sanjoy Dasgupta




This tutorial will focus on entropy, exponential families, and information projection. We’ll start by seeing the sense in which entropy is the only reasonable definition of randomness. We will then use entropy to motivate exponential families of distributions – which include the ubiquitous Gaussian, Poisson, and Binomial distributions, but also very general graphical models. The task of fitting such a distribution to data is a convex optimization problem with a geometric interpretation as an “information projection”: the projection of a prior distribution onto a linear subspace (defined by the data) so as to minimize a particular information-theoretic distance measure. This projection operation, which is more familiar in other guises, is a core optimization task in machine learning and statistics. We’ll study the geometry of this problem and discuss algorithms for it.


Sanjoy Dasgupta

“Sanjoy Dasgupta obtained his undergraduate degree from Harvard College in 1993. He worked for a year at Bell Laboratories and since then has been a graduate student at U.C. Berkeley, under the supervision of Umesh Vazirani. His thesis work, which will be completed this December, is motivated by the need for efficient and provably good learning algorithms for various commonly-used families of probability distributions.”