Content Coverage Maximization on Word Networks for Hierarchical Topic Summarization
- Chi Wang ,
- Xiao Yu ,
- Yanen Li ,
- Chengxiang Zhai ,
- Jiawei Han
Proceeding of 2013 ACM Conference on Information and Knowledge Management |
Published by ACM – Association for Computing Machinery
This paper studies text summarization by extracting hierarchical topics from a given collection of documents. We propose a new approach of text modeling via network analysis. We convert documents into a word influence network, and find the words summarizing the major topics with an efficient influence maximization algorithm. Besides, the influence capability of the topic words on other words in the network reveal the relations among the topic words. Then we cluster the words and build hierarchies for the topics. Experiments on large collections of Web documents show that a simple method based on the influence analysis is effective, compared with existing generative topic modeling and random walk based ranking.
© ACM. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version can be found at http://dl.acm.org.