A Phrase Mining Framework for Recursive Construction of a Topical Hierarchy

  • ,
  • Marina Danilevsky ,
  • Nihit Desai ,
  • Yinan Zhang ,
  • Phuong Nguyen ,
  • Thrivikrama Taula ,
  • Jiawei Han

Proceeding of 2013 ACM SIGKDD Conference on Knowledge Discovery and Data Mining |

Published by ACM – Association for Computing Machinery

A high quality hierarchical organization of the concepts in a dataset at different levels of granularity has many valuable applications such as search, summarization, and content browsing. In this paper we propose an algorithm for recursively constructing a hierarchy of topics from a collection of content-representative documents. We characterize each topic in the hierarchy by an integrated ranked list of mixed-length phrases. Our mining framework is based on a phrase-centric view for clustering, extracting, and ranking topical phrases. Experiments with datasets from different domains illustrate our ability to generate hierarchies of high quality topics represented by meaningful phrases.