Abstract

Many existing techniques in content based video retrieval
treat a video sequence as a whole to match it against a query
video or to assign a text label. Such an approach has serious
limitations when applied to human action retrieval because
an action may occur only in a sub-region and last for a small
portion of the video length. In situations like this, we essen-
tially need to match the subvolumes of the video sequences
against the query video. A naive exhaustive search is im-
practical due to large number of possible subvolumes for each
video sequence. In this paper, we propose a novel framework
for action retrieval which performs pattern matching at sub-
volume level and is very efficient in handling large corpus of
videos. We construct an unsupervised random forest to in-
dex the video database, generate a score volume with Hough
voting and then employ a max sub-path strategy to quickly
search for the temporal and spatial positions of all the video
sequences in the database. We present action search experi-
ments on challenging datasets to validate the efficiency and
effectiveness of our system.