A Phrase Mining Framework for Recursive Construction of a Topical Hierarchy
- Chi Wang ,
- Marina Danilevsky ,
- Nihit Desai ,
- Yinan Zhang ,
- Phuong Nguyen ,
- Thrivikrama Taula ,
- Jiawei Han
Proceeding of 2013 ACM SIGKDD Conference on Knowledge Discovery and Data Mining |
Published by ACM – Association for Computing Machinery
A high quality hierarchical organization of the concepts in a dataset at different levels of granularity has many valuable applications such as search, summarization, and content browsing. In this paper we propose an algorithm for recursively constructing a hierarchy of topics from a collection of content-representative documents. We characterize each topic in the hierarchy by an integrated ranked list of mixed-length phrases. Our mining framework is based on a phrase-centric view for clustering, extracting, and ranking topical phrases. Experiments with datasets from different domains illustrate our ability to generate hierarchies of high quality topics represented by meaningful phrases.
© ACM. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version can be found at http://dl.acm.org.