Beyond bag-of-features: adding spatial and shape information


August 25, 2006


Cordelia Schmid


INRIA Rhone-Alpes


Bag-of-features have recently shown very good performance for image category classification. However, their representation is orderless and based on appearance features only. In this talk we show how to integrate spatial information and how to add shape features.

First, we present a method for recognizing scene categories based on approximate global geometric correspondence. It works by partitioning the image into increasingly fine sub-regions and computing histograms of local features found inside each sub-region. The resulting “spatial pyramid” is a simple and computationally efficient extension of an orderless bag-of-features image representation, and it shows significantly improved performance on challenging scene categorization tasks.

Second, we describe a method which exploits spatial relations between features using the object boundaries provided during supervised training. It increases the weights of features that agree on position and shape of the object and suppresses the weights of background features. The proposed representation is thus richer and more robust to background clutter. Experimental results show that our approach improves over whole image classification. Furthermore, we apply the spatial model to object localization.

Third, a shape-based object detection technique is presented. It is based on pairs of connected contour segments, which are local features of intermediate complexity. Image windows are coarsely subdivided into tiles, each described by a bag of these features. After training a window classifier, novel object instances are localized via a multi-scale sliding-window mechanism. An extensive evaluation shows that the approach can successfully localize shape-based objects in cluttered scenes, while allowing for scale changes and intra-class variations.

This is joint work with V. Ferrari, F. Jurie, S. Lazebnik,
M. Marszalek and J. Ponce.


Cordelia Schmid

Cordelia Schmid holds a M.S. degree in Computer Science from the University of Karlsruhe and a Doctorate, also in Computer Science, from the Institut National Polytechnique de Grenoble (INPG). Her doctoral thesis on “Local Greyvalue Invariants for Image Matching and Retrieval” received the best thesis award from INPG in 1996. She received the Habilitation degree in 2001 for her thesis entitled “From Image Matching to Learning Visual Models”. Dr. Schmid was a post-doctoral research assistant in the Robotics Research Group of Oxford University in 1996–1997. Since 1997 she has held a permanent research position at INRIA Rhone-Alpes, where she is a research director and directs the INRIA team called LEAR for LEArning and Recognition in Vision. Dr. Schmid is the author of over fifty technical publications. She has been an Associate Editor for the IEEE Transactions on Pattern Analysis and Machine Intelligence (2001–2005) and for the International Journal of Computer Vision (2004—). She was program chair of the 2005 IEEE Conference on Computer Vision and Pattern Recognition, and she has served on the program committees of several major conferences, notably as an area chair for CVPR’00, ECCV’02, ICCV’03, ECCV’04, CVPR’04, ICCV’05, NIPS’05 and NIPS’06. In 2006, she was awarded the Longuet-Higgins prize for fundamental contributions in computer vision that have withstood the test of time.