In this paper, we describe a model-based approach to object recognition. Spatial relationships between matching primitives are modeled using a purely local bi-gram representation consisting of transition probabilities between neighboring primitives. For matching primitives, sets of one, two or three features are used. The addition of doublets and triplets provides a highly discriminative matching primitive and a reference frame that is invariant to similarity or affine transformations. The recognition of new objects is accomplished by finding trees of matching primitives in an image that obey the model learned for a specific object class. We propose a greedy approach based on best-first-search expansion for creating trees. Experimental results are presented to demonstrate the ability of our method to recognize objects undergoing nonrigid transformations for both object instance and category recognition. Furthermore, we show results for both unsupervised and semi-supervised learning.