Abstract

KinectFusion is a method for real-time capture of dense 3D geometry of the physical environment using a depth sensor. The system allows capture of a large dataset of 3D scene reconstructions at very low cost. In this paper we discuss the properties of the generated data and evaluate in which situations the method is accurate enough to provide ground truth models for low-level image processing tasks like stereo and optical flow estimation. The results suggest that the method is suitable for the fast acquisition of medium scale scenes (a few meters across), filling a gap between structured light and LiDAR scanners. For these scenes e.g. ground truth optical flow fields with accuracies of approximately 0.1 pixel can be created. We reveal an initial, high-quality dataset consisting of 57 scenes which can be used by researchers today, as well as a new, interactive tool implementing the KinectFusion method. Such datasets can then also be used as training data, e.g. for 3D recognition and depth inpainting.