Inference and Learning in Structured-Output Models for Computer Vision


February 21, 2012


Dhruv Batra


Toyota Technological Institute at Chicago (TTIC)


A large number of problems in computer vision involve predictions over exponentially (or infinitely) large structured-output spaces, e.g. the space of segmentations of an image, the space of all object-part hierarchies in a context-free grammar, the space of all pixel-level depth-predictions, etc.

In order to build intelligent vision systems that are able to reason about these tasks, we must address the challenges of 1) representation: how do we store and represent beliefs over exponentially and infinitely large output-spaces? 2) learning: how do we learn these beliefs from data? 3) inference: how do we predict under these beliefs? and 4) their interactions: the richer the model, the more difficult it is to learn and infer under. In this talk, I will present a sampling of my recent work that addresses some of these challenges.

While a lot of progress has been made on the “static” version of the MAP inference problem, a number of situations require dynamic inference algorithms that must adapt and reorder computation to focus on “important” parts of the problem. I will present a novel measure for identifying such important parts of the problem and demonstrate how it is useful in speeding up inference algorithms in a variety of settings.

Next, I will talk about our recent work on the M-Best-Mode problem, which involves extracting not just the most probable solution, but also a /diverse/ set of top M most probable solutions in discrete graphical models (like MRFs/CRFs). Extracting the top M modes of the distribution allows us to better exploit the beliefs that our model holds.

Joint work with Pushmeet Kohli (MSRC), Vladimir Kolmogorov (IST), Sebastian Nowozin (MSRC), Greg Shakhnarovich (TTIC), Ashutosh Saxena (Cornell), Daniel Tarlow (UToronto) and Payman Yadollahpour (TTIC).


Dhruv Batra

Dhruv Batra is a Research Assistant Professor at Toyota Technological Institute at Chicago (TTIC), a philanthropically endowed academic computer science institute affiliated with the University of Chicago. He received his M.S. and Ph.D. degrees from Carnegie Mellon University in 2007 and 2010 respectively, advised by Tsuhan Chen. In the past, he has held visiting positions at Cornell University and MIT.

His research interests include computer vision, machine learning and applications of combinatorial optimization algorithms to learning and vision tasks. Specifically, he is interested in structured-output prediction, MAP inference in MRFs, max-margin methods, co-segmentation in multiple images, and interactive 3D modeling.