Scalable Moment-Based Inference for Latent Dirichlet Allocation

  • Chi Wang ,
  • Xueqing Liu ,
  • Yanglei Song ,
  • Jiawei Han

Proceeding of 2014 European Conference on Machine Learning and Principles and Practices of Knowledge Discovery in Databases |

Topic models such as Latent Dirichlet Allocation have been  useful text analysis methods of wide interest. Recently, moment-based  inference with provable performance has been proposed for topic models. Compared with inference algorithms that approximate the maximum  likelihood objective, moment-based inference has theoretical guarantee  in recovering model parameters. One such inference method is tensor  orthogonal decomposition, which requires only mild assumptions for exact recovery of topics. However, it suffers from scalability issue due to  creation of dense, high-dimensional tensors. In this work, we propose a  speedup technique by leveraging the special structure of the tensors. It  is efficient in both time and space, and only requires scanning the corpus  twice. It improves over the state-of-the-art inference algorithm by one to  three orders of magnitude, while preserving equal inference ability.