Joint Multiview Segmentation and Localization of RGB-D Images using Depth-Induced Silhouette Consistency

  • Chi Zhang ,
  • Zhiwei Li (李志伟) ,
  • Rui Cai ,
  • Hongyang Chao ,
  • Yong Rui

Proc. of the 29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016) |

Published by Institute of Electrical and Electronics Engineers, Inc.

In this paper, we propose an RGB-D camera localization approach which takes an effective geometry constraint, i.e. silhouette consistency, into consideration. Unlike existing approaches which usually assume the silhouettes are provided, we consider more practical scenarios and generate the silhouettes for multiple views on the fly. To obtain a set of accurate silhouettes, precise camera poses are required to propagate segmentation cues across views. To perform better localization, accurate silhouettes are needed to constrain camera poses. Therefore the two problems are intertwined with each other and require a joint treatment. Facilitated by the available depth, we introduce a simple but effective silhouette consistency energy term that binds traditional appearance-based multi-view segmentation cost and RGB-D frame-to-frame matching cost together. Optimization of the problem w.r.t. binary segmentation masks and camera poses naturally fits in the graph cut minimization framework and the Gauss-Newton non-linear least-squares method respectively. Experiments show that the proposed approach achieves state-of-the-arts performance on both tasks of image segmentation and camera localization.