Microsoft Research Blog

Artificial intelligence

  1. Photo clip art 

    July 29, 2007

    We present a system for inserting new objects into existing photographs by querying a vast image-based object library, pre-computed using a publicly available Internet object database. The central goal is to shield the user from all of the arduous tasks typically involved in image compositing.…

  2. Recognizing Assembly Tasks Through Human Demonstration 

    June 30, 2007 | Jun Takamatsu, K. Ogawara, H. Kimura, and Katsushi Ikeuchi

    As one of the methods for reducing the work of programming, the Learning-from-Observation (LFO) paradigm has been heavily promoted. This paradigm requires the programmer only to perform a task in front of a robot and does not require expertise. In this paper, the LFO paradigm…

  3. Tree-based Classifiers for Bilayer Video Segmentation 

    June 17, 2007 | Pei Yin, Antonio Criminisi, John Winn, and M. Essa

    This paper presents an algorithm for the automatic segmentation of monocular videos into foreground and background layers. Correct segmentations are produced even in the presence of large background motion with nearly stationary foreground. There are three key contributions. The first is the introduction of a…

  4. Incorporating On-demand Stereo for Real Time Recognition 

    June 17, 2007 | T. Deselaers, Antonio Criminisi, John Winn, and Ankur Agarwal

    A new method for localising and recognising hand poses and objects in real-time is presented. This problem is important in vision-driven applications where it is natural for a user to combine hand gestures and real objects when interacting with a machine. Examples include using a…

  5. Single View Point Omnidirectional Camera Calibration from Planar Grids 

    April 9, 2007 | Christopher Mei and Patrick Rives

    This paper presents a flexible approach for calibrating omnidirectional single viewpoint sensors from planar grids. These sensors are increasingly used in robotics where accurate calibration is often a prerequisite. Current approaches in the field are either based on theoretical properties and do not take into…

  6. Single-Histogram class models for image segmentation 

    December 13, 2006 | F. Schroff, Antonio Criminisi, and A. Zisserman

    Histograms of visual words (or textons) have proved effective in tasks such as image classification and object class recognition. A common approach is to represent an object class by a set of histograms, each one corresponding to a training exemplar. Classification is then achieved by…

  7. Representation for knot-tying tasks 

    October 31, 2006

    The learning from observation (LFO) paradigm has been widely applied in various types of robot systems. It helps reduce the work of the programmer. However, the applications of available systems are limited to manipulation of rigid objects. Manipulation of deformable objects is rarely considered, because…

  8. Boosting-Based Multimodal Speaker Detection for Distributed Meetings 

    September 30, 2006

    Speaker detection is a very important task in distributed meeting applications. This paper discusses a number of challenges we met while designing a speaker detector for the Microsoft RoundTable distributed meeting device, and proposes a boosting-based multimodal speaker detection (BMSD) algorithm. Instead of performing sound…

  9. Bilayer Segmentation of Live Video 

    June 17, 2006 | Antonio Criminisi, Geoffrey Cross, Andrew Blake, and Vladimir Kolmogorov

    This paper presents an algorithm capable of real-time separation of foreground from background in monocular video sequences. Automatic segmentation of layers from colour/contrast or from motion alone is known to be error-prone. Here motion, colour and contrast cues are probabilistically fused together with spatial and…

  10. TextonBoost : joint appearance, shape and context modeling for multi-class object recognition and segmentation 

    May 7, 2006 | Jamie Shotton, John Winn, Carsten Rother, and Antonio Criminisi

    This paper proposes a new approach to learning a discriminative model of object classes, incorporating appearance, shape and context information efficiently. The learned model is used for automatic visual recognition and semantic segmentation of photographs. Our discriminative model exploits novel features, based on textons, which…