Histograms of visual words (or textons) have proved eﬀective in tasks such as image classiﬁcation and object class recognition. A common approach is to represent an object class by a set of histograms, each one corresponding to a training exemplar. Classiﬁcation is then achieved by k-nearest neighbour search over the exemplars.
In this paper we introduce two novelties on this approach: (i) we show that new compact single histogram models estimated optimally from the entire training set achieve an equal or superior classiﬁcation accuracy. The beneﬁt of the single histograms is that they are much more eﬃcient both in terms of memory and computational resources; and (ii) we show that bag of visual words histograms can provide an accurate pixel-wise segmentation of an image into object class regions. In this manner the compact models of visual object classes give simultaneous segmentation and recognition of image regions.
The approach is evaluated on the MSRC database  and it is shown that performance equals or is superior to previous publications on this database.