We describe a method to completely automatically recover 3D scene structure together with 3D camera positions from a sequence of images acquired by an unknown camera undergoing unknown movement. Unlike “tuned” systems which use calibration objects or markers to recover this information, and are therefore often limited to a particular scale, the approach of this paper is more general and can be applied to a large class of scenes. It is demonstrated here for interior and exterior sequences using both controlled-motion and handheld cameras.

The paper reviews Computer Vision research into structure and motion recovery, providing a tutorial introduction to the geometry of multiple views, estimation and correspondence in video streams. The core method, which simultaneously extracts the 3D scene structure and camera positions, is applied to the automated recovery of VRML 3D textured models from a video sequence.