Abstract

In this paper we present a new real-time image-based localization method for scenes that have been reconstructed offline using structure from motion. From input video, our method continuously computes six-degree-of-freedom camera pose estimates by efficiently tracking natural features and matching them to 3D points reconstructed by structure from motion. Our main contribution lies in efficiently interleaving a fast keypoint tracker that uses inexpensive binary feature descriptors with a new approach for direct 2D-to-3D matching. Our 2D-to-3D matching scheme avoids the need for online extraction of scale-invariant features. Instead, offline we construct an indexed database containing multiple DAISY descriptors per 3D point extracted at multiple scales. The key to the efficiency of our method is invoking DAISY descriptor extraction and matching sparingly during localization, and in distributing this computation over a temporal window of successive frames. This enables the system to run in real-time and achieve low per-frame latency over long durations. Our algorithm runs at over 30 Hz on a laptop and at 12 Hz on a low-power computer suitable for onboard computation on a mobile robot such as a micro-aerial vehicle. We have evaluated our method using ground truth and present results on several challenging indoor and outdoor sequences.