Higher-order models in Computer Vision

  • Pushmeet Kohli ,
  • Carsten Rother

Chapter 3, in Image Processing and Analysing Graphs: Theory and Practice

Published by CRC Press | 2012

ISBN: 9.78144E+12

Many computer vision problems such as object segmentation, disparity estimation, and 3D reconstruction can be formulated as pixel or voxel labeling problems. The conventional methods for solving these problems use pairwise Conditional and Markov Random Field (CRF/MRF) formulations [1], which allow for the exact or approximate inference of Maximum a Posteriori (MAP) solutions. MAP inference is performed using extremely efficient algorithms such as combinatorial methods (e.g. graph-cut [2, 3, 4] or the BHSalgorithm [5, 6]), or message passing based techniques (e.g. Belief Propagation (BP) [7, 8, 9] or TreeReweighted (TRW) message passing [10, 11]).

The classical formulations for image labelling problems represent all output elements using random variables. An example is the problem of interactive object cut-out where each pixel is represented using a random variable which can take two possible labels: foreground or background. The conventionally used pairwise random field models introduce a statistical relationship between pairs of random variables, often only among the immediate 4 or 8 neighboring pixels. Although such models permit efficient inference, they have restricted expressive power. In particular, they are unable to enforce the high-level structural dependencies between pixels that have been shown to be extremely powerful for image labeling problems. For instance, while segmenting an object in 2D or 3D, we might know that all its pixels (or parts) are connected. Standard pairwise MRFs or CRFs are not able to guarantee that their solutions satisfy such a constraint. To overcome this problem, a global potential function is needed which assigns all such invalid solutions a zero probability or an infinite energy.

Despite substantial work from several communities, pairwise MRF and CRF models for computer vision problems have not been able to solve image labelling problems such as object segmentation fully. This has led researchers to question the richness of these classical pairwise energy function based formulations, which in turn has motivated the development of more sophisticated models. Along these lines, many have turned to the use of higher-order models that are more expressive, thereby enabling the capture of statistics of natural images more closely.

The last few years have seen the successful application of higher-order CRFs and MRFs to some lowlevel vision problems such as image restoration, disparity estimation and object segmentation [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]. Researchers have used models composed of new families of higher-order potentials i.e. potentials defined over multiple variables, which have higher modelling power and lead to more accurate models of the problem. Researchers have also investigated incorporation of constraints such as connectivity of the segmentation in CRF and MRF models. This is done by including higher-order or global potentials1 that assign zero probability (infinite cost) to all label configurations that do not satisfy these constraints.

One of the key challenges with respect to higher-order models is the question of efficiently inferring the Maximum a Posterior (MAP) solution. Since, inference in pairwise models is very well studied, one popular technique is to transform the problem back to a pairwise random field. Interestingly, any higher-order function can be converted to a pairwise one, by introducing additional auxiliary random variables [5, 24]. Unfortunately, the number of auxiliary variables grows exponentially with the arity of the higher-order function, hence in practice only higher-order function with a few variables can be handled efficiently. However, if the higher-order function contains some inherent “structure” then it is indeed possible to practically perform MAP inference in a higher-order random field where each higher-order function may act on thousands of variables [25, 15, 26, 21]. We will review various examples of such potential functions in this chapter. There is a close relationship between higher-order random fields and random field models containing latent variables [27, 19]. In fact, as we will see later in the chapter, any higher order model can be written as a pairwise model with auxiliary latent variables and vice versa [27]. Such transformations enable the use of powerful optimization algorithms and even result in global optimally solutions for some problem instances. We will explain the connection between higher order models and models containing latent variables using the problem of interactive foreground/background image segmentation as an example [28, 29].

There is a close relationship between higher-order random fields and random field models containing latent variables [27, 19]. In fact, as we will see later in the chapter, any higher order model can be written as a pairwise model with auxiliary latent variables and vice versa [27]. Such transformations enable the use of powerful optimization algorithms and even result in global optimally solutions for some problem instances. We will explain the connection between higher order models and models containing latent variables using the problem of interactive foreground/background image segmentation as an example [28, 29].