Object Class Recognition at a Glance

John Winn; Antonio Criminisi

Object Class Recognition at a Glance

John Winn ,
Antonio Criminisi

Proc. Conf. Computer Vision and Pattern Recognition (CVPR) -- Video Track | January 2006

Download BibTex

This video shows our real-time object class recognition system at work. Object class recognition is a very challenging problem. The difﬁculty lies in capturing the variability of appearance and shape of different objects belonging to the same class, while avoiding confusing objects from different classes. However, state of the art algorithms such as [2] are capable of delivering high classiﬁcation accuracy at interactive speed when dealing with a limited number of classes (around ten). Following the texton-based modeling approach in [2] we have developed an application for real-time segmentation and recognition of objects placed on a table top (ﬁgure 1). The system comprises two steps: object segmentation and classiﬁcation. First, each object region is separated from the table top. This happens by running a patch-based classiﬁer which discriminates between the class “table” and everything else. This technique is very different from more conventional background subtraction and is robust with respect to shadows, light changes and camera shake or motion. Second, once all the non-table connected regions have been extracted, they are classiﬁed as belonging to one of ﬁfteen object classes using the same discriminative technique. Each classiﬁer is a random forests discriminative model [1], [3] using pixel difference features. The classiﬁer is learned similarly to [2] to achieve maximum generalisation with high efﬁciency and is designed to be invariant both to rotation and to small changes in scale. Our features are computed on both the RGB image (thus providing information about appearance) and on the binary segmentation mask (thus capturing information about object shape). In the learnt random decision trees each node is hence associated to either appearance or shape. Figure 2 illustrates such an example. The use of shape features is a key component of the recognition classiﬁer since the shape information provided signiﬁcantly boosts the accuracy (by more than 10%). Our algorithm runs on 320 × 240 images at up to 20 frames per second, with an overall accuracy of around 90%. Training our discriminative class models for 15 classes from 600 training images takes only about ten minutes.