Joint work with Jean Ponce at UIUC, and Cordelia Schmid, Jianguo Zhang, and Marcin Marszalek at INRIA Rhone-Alpes.
The key to success in many visual recognition tasks is designing the right low-level image features and combining them into rich and discriminative representations of the classes of interest. My talk will focus on representations based on local invariant features. First, I will discuss a bag-of-features approach originally designed for the problem of recognizing images of textured surfaces under perspective distortions and non-rigid deformations. Though this approach has sufficient descriptive power for modeling textures, it seems ill suited for modeling objects because it ignores spatial relations and makes no distinction between object features and clutter. However, in a recent large-scale comparative evaluation, we have shown that bags of features, combined with an appropriate SVM kernel, can be surprisingly effective for object categization in the presence of substantial clutter and intra-class variation. I will close by describing ongoing efforts to develop a more “structured” alternative to bags of features for object recognition, namely, a representation based on semi-local parts, or groups of features characterized by stable appearance and geometric layout.