Over the years, digital video has emerged as our medium of choice to capture and share information with each other across multiple platforms. These include everything from user-generated online content, to egocentric visual data captured using wearable devices. Systems that can automatically analyze videos are therefore becoming increasingly important, and will play a vital role in accomplishing several key objectives, such as building smarter robots, monitoring people’s health as they age, and preventing crime through improved surveillance. In this talk, I will address some of the big challenges in building automatic video analysis systems, particularly focusing on the ones that analyze video data at scale.
Using sports visualization as a motivating application, I will begin by discussing the problem of tracking key-objects in an environment captured from multiple overlapping static cameras. I will present results of our framework tested on close to 300,000 frames of real soccer footage captured over a diverse set of playing conditions. Furthermore, I will present extensions of this problem for user-generated videos captured using hand-held mobile devices.
I will then focus on the problem of detecting important interactions among the key-objects in videos that constitute interesting events. Using large-scale summarization of user-generated videos as a motivating application, I will discuss how millions of online images can be efficiently used as a prior to constrain our search to find representative frames of user-generated videos.
Finally, I will talk about analyzing event sequences that constitute everyday human activities. I will particularly talk about sequence representations that attempt to learn the global structure of activities by using their local event-statistics. I will discuss how such a data-driven approach towards activity modeling can help discover and characterize human activities, and learn typical behaviors crucial for detecting anomalous activities in an environment.