Text summarization: News and Beyond


August 11, 2004


Kathy McKeown


Columbia University


Redundancy in large text collections, such as the web, creates both problems and opportunities for natural language systems. On the one hand, the presence of numerous sources conveying the same information causes difficulties for end users of search engines and news providers; they must read the same information over and over again. On the other hand, redundancy can be exploited to identify important and accurate information for applications such as summarization and question answering.

Columbia’s Newsblaster system for online news summarization exploits online redundancy to generate a summary, at the same time creating a concise synopsis of recent events for end users. Newsblaster crawls the web nightly for news articles, clusters news on the same event and generates a summary of each event. In this talk, I will present the current capabilities of Newsblaster, with some focus on its ability to generate and edit text. I will then turn to our ongoing work which goes beyond summarization of English news. Our research on summarization of multilinguual news requires us to deal with noisy input; we rely on state of the art machine translation systems and use information that is available at the time of summarization to improve the fluency of the summary. We are also moving to summarization of other media, including email and meetings. Both of these media also require the ability to handle noisy input, but add an additional challenge to handle features of dialog.


Kathy McKeown

Kathleen R. McKeown is a Professor of Computer Science at Columbia University. Her research interests include text summarization, natural language generation, multi-media explanation, digital libraries, concept to speech generation and natural language interfaces. McKeown received the Ph.D. in Computer Science from the University of Pennsylvania in 1982 and has been at Columbia since then. In 1985 she received a National Science Foundation Presidential Young Investigator Award, in 1991 she received a National Science Foundation Faculty Award for Women, and in 1994 was selected as a AAAI Fellow. McKeown is also quite active nationally. She serves as a board member of the Computing Research Association and serves as secretary of the board. She served as President of the Association of Computational Linguistics in 1992, Vice President in 1991, and Secretary Treasurer for 1995-1997. She has served on the Executive Council of the Association for Artificial Intelligence and was co-program chair of their annual conference in 1991.