Gesture and Speech for Video Content Navigation

  • Boon-Lock Yeo ,
  • Gary Bradski ,
  • Minerva M. Yeung

This article describes ongoing research in the use computer vision gesture and speech recognition techniques as a natural interface for video content navigation, and the design of a navigation and browsing system that caters to these natural means of computer-human interaction. For consumer applications, video content navigation presents two challenges: (1)how to parse and summarize multiple video streams in an intuitive and efficient manner, and (2)what type of interface will enhance the ease of use for video browsing and navigation in a living room setting or an interactive environment. In this paper, we address the issues and propose the techniques that combine video content navigation with gesture and speech recognition, seamlessly and intuitively, in an integrated system. We present a new type of browser for browsing and navigating video content, as well as a gesture and speech recognition interface for this browser.