Abstract

We present the first pipeline for real-time volumetric surface reconstruction and dense 6DoF camera tracking running purely on standard, off-the-shelf mobile phones. Using only the embedded RGB camera, our system allows users to scan objects of varying shape, size, and appearance in seconds, with real-time feedback during the capture process. Unlike existing state of the art methods, which produce only point-based 3D models on the phone, or require cloud-based processing, our hybrid GPU/CPU pipeline is unique in that it creates a connected 3D surface model directly on the device at 25Hz. In each frame, we perform dense 6DoF tracking, which continuously registers the RGB input to the incrementally built 3D model, minimizing a noise aware photoconsistency error metric. This is followed by efficient key-frame selection, and dense per-frame stereo matching. These depth maps are fused volumetrically using a method akin to KinectFusion, producing compelling surface models. For each frame, the implicit surface is extracted for live user feedback and pose estimation. We demonstrate scans of a variety of objects, and compare to a Kinect-based baseline, showing on average ∼ 1.5cm error. We qualitatively compare to a state of the art point-based mobile phone method, demonstrating an order of magnitude faster scanning times, and fully connected surface models.