Abstract

We present a new optimization based parsing framework for the geometric analysis of a single image coming from a man-made environment. This framework models the scene as a composition of geometric primitives spanning different layers from low level (edges) through mid-level (lines segments, lines and vanishing points) to high level (the zenith and the horizon). The inference in such a model thus jointly and simultaneously estimates (a) the grouping of edges into the line segments, (b) the grouping of line segments into the straight lines, (c) the grouping of lines into parallel families, and (d) the positioning of the horizon and the zenith in the image. Such a unified treatment means that the uncertainty information propagates between the layers of the model. This is in contrast to most previous approaches to the same problem, which either ignore the middle levels (line segments or lines) all together, or use the bottom-up step-by-step pipeline.

For the evaluation, we consider a publicly available York Urban dataset of “Manhattan” scenes, and also introduce a new, harder dataset of 103 urban outdoor images containing many non-Manhattan scenes. The comparative evaluation for the horizon estimation task demonstrate higher accuracy and robustness attained by our method when compared to the current state-of-the-art approaches.