Analysis and detection of objects in images or actions in video sequences require a complex notion of similarity across visual data. Existing approaches are often based on extracting informative parameters or models learned from many prior examples of the visual data (object or action) of interest. These approaches, however, are often restricted to a small set of pre-defined classes of visual data, and do not generalize to scenarios with unfamiliar objects/actions. Moreover, in many realistic cases and problems, one has only a single example of the object or action of interest, or even NO explicit example whatsoever of what he is looking for…
In this talk I will show how we can infer about the global similarity of different complex visual data, by employing local similarities within and across these visual data. I will demonstrate the power of this approach through several example problems. These include:
- Prediction of missing visual information in images and videos.
- Detection and retrieval of complex objects in cluttered images using a single example – often only a rough hand-sketch of the object of interest.
- Detection of complex actions performed by differently dressed people
against different backgrounds, based on a single example clip (without requiring fg/bg segmentation or motion estimation).
Joint work with Michal Irani (the first part also with Yonatan Wexler).