Abstract

This paper presents an algorithm capable of real-time separation of foreground from background in monocular video sequences.
Automatic segmentation of layers from colour/contrast or from motion alone is known to be error-prone. Here motion, colour and contrast cues are probabilistically fused together with spatial and temporal priors to infer layers accurately and efficiently. Central to our algorithm is the fact that pixel velocities are not needed, thus removing the need for optical flow estimation, with its tendency to error and computational expense. Instead, an efficient motion vs nonmotion classifier is trained to operate directly and jointly on intensity-change and contrast. Its output is then fused with colour information. The prior on segmentation is represented by a second order, temporal, Hidden Markov Model, together with a spatial MRF favouring coherence except where contrast is high. Finally, accurate layer segmentation and explicit occlusion detection are efficiently achieved by binary graph cut.
The segmentation accuracy of the proposed algorithm is quantitatively evaluated with respect to existing groundtruth data and found to be comparable to the accuracy of a state of the art stereo segmentation algorithm. Foreground/background segmentation is demonstrated in the application of live background substitution and shown to generate convincingly good quality composite video.