Microsoft Research Blog

Artificial intelligence

Photo clip art

July 29, 2007

We present a system for inserting new objects into existing photographs by querying a vast image-based object library, pre-computed using a publicly available Internet object database. The central goal is to shield the user from all of the arduous tasks typically involved in image compositing.…
Recognizing Assembly Tasks Through Human Demonstration

June 30, 2007 | Jun Takamatsu, K. Ogawara, H. Kimura, and Katsushi Ikeuchi

As one of the methods for reducing the work of programming, the Learning-from-Observation (LFO) paradigm has been heavily promoted. This paradigm requires the programmer only to perform a task in front of a robot and does not require expertise. In this paper, the LFO paradigm…
Tree-based Classifiers for Bilayer Video Segmentation

June 17, 2007 | Pei Yin, Antonio Criminisi, John Winn, and M. Essa

This paper presents an algorithm for the automatic segmentation of monocular videos into foreground and background layers. Correct segmentations are produced even in the presence of large background motion with nearly stationary foreground. There are three key contributions. The first is the introduction of a…
Incorporating On-demand Stereo for Real Time Recognition

June 17, 2007 | T. Deselaers, Antonio Criminisi, John Winn, and Ankur Agarwal

A new method for localising and recognising hand poses and objects in real-time is presented. This problem is important in vision-driven applications where it is natural for a user to combine hand gestures and real objects when interacting with a machine. Examples include using a…
Single View Point Omnidirectional Camera Calibration from Planar Grids

April 9, 2007 | Christopher Mei and Patrick Rives

This paper presents a flexible approach for calibrating omnidirectional single viewpoint sensors from planar grids. These sensors are increasingly used in robotics where accurate calibration is often a prerequisite. Current approaches in the field are either based on theoretical properties and do not take into…
Efficient Dense Stereo with Occlusion for New View-Synthesis by Four-State Dynamic Programming

January 1, 2007

A new algorithm is proposed for efﬁcient stereo and novel view synthesis. Given the video streams acquired by two synchronized cameras the proposed algorithm synthesises images from a virtual camera in arbitrary position near the physical cameras. The new technique is based on an improved,…
Single-Histogram class models for image segmentation

December 13, 2006 | F. Schroff, Antonio Criminisi, and A. Zisserman

Histograms of visual words (or textons) have proved effective in tasks such as image classification and object class recognition. A common approach is to represent an object class by a set of histograms, each one corresponding to a training exemplar. Classification is then achieved by…
Representation for knot-tying tasks

October 31, 2006

The learning from observation (LFO) paradigm has been widely applied in various types of robot systems. It helps reduce the work of the programmer. However, the applications of available systems are limited to manipulation of rigid objects. Manipulation of deformable objects is rarely considered, because…
Boosting-Based Multimodal Speaker Detection for Distributed Meetings

September 30, 2006

Speaker detection is a very important task in distributed meeting applications. This paper discusses a number of challenges we met while designing a speaker detector for the Microsoft RoundTable distributed meeting device, and proposes a boosting-based multimodal speaker detection (BMSD) algorithm. Instead of performing sound…
Discriminative Object Class Models of Appearance and Shape by Correlatons

June 17, 2006 | S. Savarese, Antonio Criminisi, and John Winn

This paper presents a new model of object classes which incorporates appearance and shape information jointly. Modeling objects appearance by distributions of visual words has recently proven successful. Here appearancebased models are augmented by capturing the spatial arrangement of visual words. Compact spatial modeling without…
Bilayer Segmentation of Live Video

June 17, 2006 | Antonio Criminisi, Geoffrey Cross, Andrew Blake, and Vladimir Kolmogorov

This paper presents an algorithm capable of real-time separation of foreground from background in monocular video sequences. Automatic segmentation of layers from colour/contrast or from motion alone is known to be error-prone. Here motion, colour and contrast cues are probabilistically fused together with spatial and…
TextonBoost : joint appearance, shape and context modeling for multi-class object recognition and segmentation

May 7, 2006 | Jamie Shotton, John Winn, Carsten Rother, and Antonio Criminisi

This paper proposes a new approach to learning a discriminative model of object classes, incorporating appearance, shape and context information efficiently. The learned model is used for automatic visual recognition and semantic segmentation of photographs. Our discriminative model exploits novel features, based on textons, which…

No results