Abstract

In this paper, we propose a weakly supervised approach to learn hierarchical object representations for visual recognition. The learning process is carried out in a bottom-up manner to discover latent visual patterns in multiple scales. To relieve the disturbance of complex backgrounds in natural images, bounding boxes of foreground objects are adopted as weak knowledge in the learning stage to promote those visual patterns which are more related to the target objects. The difference between the patterns of foreground objects and backgrounds is relatively vague at low-levels, but becomes more distinct along with the feature transformations to high-levels. In the test stage, an input image is verified against the learnt patterns level-by-level, and the responses at each level construct a hierarchy of representations which indicates the occurring possibilities of the target object at various scales. Experiments on two PASCAL datasets showed encouraging results for visual recognition.