Abstract

Query-biased Web page summarization is the summarization of a Web page reflecting the relevance of it to a specific query. It plays an important role in search results representation of Web search engines. In this paper, we propose a learning-based query-biased Web page summarization method. The summarization problem is solved within the typical sentence selection framework. Different from existing Web page summarization methods that use page content or link context alone, both of them are considered as the sources of sentences in this work. Most of existing learning-based summarization methods treat summarization as a sentence classification problem and train a classifier to discriminate between extracted sentences and non-extracted sentences of all training documents. The basic assumption of these methods is that sentences from different documents are comparable with respect to the class information. In contrast to the classification scheme, a ranking scheme is introduced to rank extracted sentences higher than non-extracted sentences of each training document. The underlying assumption that sentences within a document are comparable is weaker and more reasonable than the assumption of classification-based scheme. Extensive results using intrinsic evaluation metrics gauge many aspects of the proposed method.