Chapter 9.2 A Unified Framework for Video Summarization, Browsing and Retrieval

Ziyou Xiong; Yong Rui; Regunathan Radhakrishnan; Ajay Divakaran; Thomas S. Huang

Chapter 9.2 A Unified Framework for Video Summarization, Browsing and Retrieval

Ziyou Xiong ,
Yong Rui ,
Regunathan Radhakrishnan ,
Ajay Divakaran ,
Thomas S. Huang

Chapter 9.2, in The Image and Video Porcessing Handbook (2nd Edition)

Published by Academic Press | 2005 | The Image and Video Porcessing Handbook (2nd Edition) edited by Alan Bovik, Academic Press, 2005 edition

Alan Bovik

Download BibTex

Video content can be accessed by using either a top-down approach or a bottom-up approach [1, 2, 3, 4]. The top-down approach, i.e. video browsing, is useful when we need to get an \essence” of the content. The bottom-up approach, i.e. video retrieval, is useful when we know exactly what we are looking for in the content, as shown in Fig. 1. In video summarization, what \essence” the summary should capture depends on whether the content is scripted or not. Since scripted content, such as news, drama & movie, is carefully structured as a sequence of semantic units, one can get its essence by enabling a traversal through representative items from these semantic units. Hence, Table of Contents (ToC) based video browsing caters to summarization of scripted content. For instance, a news video composed of a sequence of stories can be summarized/browsed using a key-frame representation for each of the shots in a story. However, summarization of unscripted content, such as surveillance & sports), requires a \highlights” extraction framework that only captures remarkable events that constitute the summary.