In the first part of my talk, I will present a method for calibrating and synchronizing a network of cameras observing an event from multiple viewpoints. All the necessary information is recovered by analyzing the motion of the silhouettes in the multiple video streams. This approach will ease the deployment of cameras for video-based data-driven 3D modeling, useful for digitizing sports, cultural performances and events recorded by surveillance camera networks. This method does not require physical access to the scene or a pre-calibration phase involving specific calibration objects. It has been tested on various datasets acquired by other vision researchers.
Next I will present my recent work on multi-view stereo. Recently global, variational methods that optimize a suitable photo-consistency based energy functional have shown promising results; but they fail to utilize the strong cues provided by silhouettes. I will present a graph-cut based volumetric stereo formulation where silhouette constraints can be exactly enforced; this will counter the minimal surface bias present in the global methods and will provide an overall stronger guarantee. Instead of operating on a uniform 3D grid, I also propose a photo-consistency driven adaptive graph construction strategy that cuts memory and computational requirements of the overall approach and makes it possible to work at the resolution required for achieving high quality reconstructions. The effectiveness of the method is demonstrated on the multi-view stereo benchmark and other 3D photography datasets.
Apart from this, I will also talk about an interactive image-based modeling system for generating textured 3D models of architectural scenes where the 2D interaction is combined with multi-view geometric information recovered by performing structure from motion analysis on the input photographs.