Probabilistic Models for Parsing Images

February 16, 2006
Xiaofeng Ren | U.C. Berkeley

A grand challenge of computer vision is to understand and parse natural images into boundaries, surfaces and objects. To solve this problem we would inevitably need to work with visual entities and cues of heterogeneous nature, such as brightness and texture at low-level, contour and region grouping at mid-level, and shape recognition at high-level. Learning to represent and incorporate these entities and cues, along with the complexity of the visual world itself, calls for probabilistic models for image parsing. Many previous efforts in this line suffer from issues such as lack of a compact representation, lack of scale invariance or lack of comprehensive experimentation. We describe a scale-invariant image representation using piecewise linear approximations of contours and the constrained Delaunay triangulation (CDT) for completing gradientless gaps. On top of the CDT graph we develop conditional random fields (CRF) for contour completion, figure/ground organization as well as object segmentation. Large datasets of human-annotated natural images are utilized for both training and evaluation. Our quantitative results are the first to demonstrate the working of mid-level visual cues in general natural scenes. The CDT/CRF framework enables efficient representation and inference of both bottom-up and top-down information, hence applicable to various vision problems. We extend our work to joint object recognition and segmentation, in particular finding people, in static images and video.

Speaker Details

Xiaofeng Ren received his B.S. in computer science from Zhejiang University, China, in 1997, and his M.S. in computer science from Stanford University in 2000. Under the supervision of Jitendra Malik, he has been a Ph.D. student and a research assistant in the Computer Vision Group at U.C. Berkeley, expecting to receive his Ph.D. in May, 2006. His research interests lie broadly in the areas of computer vision and artificial intelligence, and he has mainly worked on contour completion, image segmentation, figure/ground labeling and human body pose recovery.