Abstract

We introduce an approach to accurately detect and segment partially occluded objects in various viewpoints and scales. Our main contribution is a novel framework for combining object-level descriptions (such as position, shape, and color) with pixel-level appearance, boundary, and occlusion reasoning. In training, we exploit a rough 3D object model to learn physically localized part appearances. To find and segment objects in an image, we generate proposals based on the appearance and layout of local parts. The proposals are then refined after incorporating object-level information, and overlapping objects compete for pixels to produce a final description and segmentation of objects in the scene. A further contribution is a novel instance penalty, which is handled very efficiently during inference. We experimentally validate our approach on the challenging PASCAL’06 car database.