From LSI to Probabilistic Topic Models: An introduction to Topic Models


August 7, 2015


Chiranjib Bhattacharyya




Topic models attempt to discover themes, or Topics, from large collection of documents. Discovering themes from a document corpus is an important problem with a variety of applications in Web-search, Corpus Browsing etc.

In this two part tutorial, we will begin by introducing neccessary background in understanding Topic models, mainly focussing on EM algorithm and Variational Inference. In the second part of the talk we will review several models starting with Latent Semantic Indexing(LSI), proposed in 1988, to the more recent and now state of the art Probabilistic Topic models. Towards the end of the talk we will discuss recent theoretical results on provable topic models.


Chiranjib Bhattacharyya

Chiranjib Bhattacharyya is an Associate Prof. in the Dept of CSA, Indian Institute of Science(IISc). His research interests are in Machine Learning.