Abstract

In this paper , we present a novel, general, and efficient architecture for addressing computer vision problems that are approached from an ‘Analysis by Synthesis’ standpoint. Analysis by synthesis involves the minimization of reconstruction error, which is typically a non-convex function of the latent target variables. State-of-the-art methods adopt a hybrid scheme where discriminatively trained predictors like Random Forests or Convolutional Neural Networks are used to initialize local search algorithms. While these hybrid methods have been shown to produce promising results, they often get stuck in local optima. Our method goes beyond the conventional hybrid architecture by not only proposing multiple accurate initial solutions but by also defining a navigational structure over the solution space that can be used for extremely efficient gradient-free local search. We demonstrate the efficacy and generalizability of our approach by on tasks as diverse as Hand Pose Estimation, RGB Camera Relocalization, and Image Retrieval.