I lead the HoloLens Science team at Microsoft in Cambridge.  My research is focused at the intersection of computer vision, AI, machine learning, and graphics, with particular emphasis on systems that allow people to interact naturally with computers.


We’re hiring!  Looking for world-class engineers, post-docs, and researchers with expertise in computer vision, graphics, and machine learning.

Thrilled to be setting up a new research group to help invent the future for Microsoft HoloLens.

SIGGRAPH 2016 paper on efficient subdivision-surface based hand tracking.

CVPR 2016 paper on hand shape calibration.

Very excited to be chosen as an MIT Technology Review Innovator Under 35 2015!  More details in this article.

Research Highlights

Hand Pose Estimation.  Real-time, accurate, robust, and flexible articulated tracking of the human hand.

Decision Jungles.  Memory-efficient generalization of decision trees and forests with improved generalization.

Scene Coordinate Regression Forests.  A new approach to 6D camera pose estimation by regression 3D scene coordinates.

Human pose estimation for Kinect.  Our work on human body part recognition for Kinect.

Short Biography

Jamie Shotton is a Partner Scientist and leads the HoloLens Science team at Microsoft in Cambridge, UK, where his team focuses on the visual understanding of people to improve interaction and communication in mixed reality.  He studied Computer Science at the University of Cambridge, where he remained for his PhD in computer vision and machine learning. He joined Microsoft Research in 2008 where he was a research scientist and head of the Machine Intelligence & Perception group, before founding the HoloLens Science Cambridge team in 2016. His research focuses at the intersection of computer vision, AI, machine learning, and graphics, with particular emphasis on systems that allow people to interact naturally with computers. He has received multiple Best Paper and Best Demo awards at top-tier academic conferences. His work on machine learning for body part recognition for Kinect was awarded the Royal Academy of Engineering’s MacRobert Award 2011, and he shares Microsoft’s Outstanding Technical Achievement Award for 2012 with the Kinect engineering team. In 2014 he received the PAMI Young Researcher Award, and in 2015 the MIT Technology Review Innovator Under 35 Award (“TR35”).


SemanticPaint: Interactive 3D Labeling and Learning at your Fingertips

Established: June 29, 2015

We present a new interactive approach to 3D scene understanding. Our system, SemanticPaint, allows users to simultaneously scan their environment, whilst interactively segmenting the scene simply by reaching out and touching any desired object or surface. Our system continuously learns from these segmentations, and labels new unseen parts of the environment. Unlike offline systems, where capture, labeling and batch learning often takes hours or even days to perform, our approach is fully online. To be…

Project Malmo

Established: June 1, 2015

How can we develop artificial intelligence that learns to make sense of complex environments? That learns from others, including humans, how to interact with the world? That learns transferable skills throughout its existence, and applies them to solve new, challenging problems? https://youtu.be/KkVj_ddseO8 Project Malmo sets out to address these core research challenges, addressing them by integrating (deep) reinforcement learning, cognitive science, and many ideas from artificial intelligence. The Malmo platform is a sophisticated AI experimentation…

Fully Articulated Hand Tracking

Established: October 2, 2014

We present a new real-time articulated hand tracker which can enable new possibilities for human-computer interaction (HCI). Our system accurately reconstructs complex hand poses across a variety of subjects using only a single depth camera. It also allows for a high-degree of robustness, continually recovering from tracking failures. However, the most unique aspect of our tracker is its flexibility in terms of camera placement and operating range.   Screenshots Please note, we are using a…

Learning to be a depth camera for close-range human capture and interaction

Established: July 14, 2014

We present a machine learning technique for estimating absolute, per-pixel depth using any conventional monocular 2D camera, with minor hardware modifications. Our approach targets close-range human capture and interaction where dense 3D estimation of hands and faces is desired. We use hybrid classification-regression forests to learn how to map from near infrared intensity images to absolute, metric depth in real-time. We demonstrate a variety of human computer interaction scenarios.  

Filter Forests for Learning Data-Dependent Convolutional Kernels

Established: February 10, 2014

We propose ‘filter forests’ (FF), an efficient new discriminative approach for predicting continuous variables given a signal and its context. FF can be used for general signal restoration tasks that can be tackled via convolutional filtering, where it attempts to learn the optimal filtering kernels to be applied to each data point. The model can learn both the size of the kernel and its values, conditioned on the observation and its spatial or temporal context.…

Decision Forests

Established: July 25, 2012

Decision Forests for Computer Vision and Medical Image Analysis A. Criminisi and J. Shotton Springer 2013, XIX, 368 p. 143 illus., 136 in color. ISBN 978-1-4471-4929-3  

KinectFusion Project Page

Established: August 9, 2011

This project investigates techniques to track the 6DOF position of handheld depth sensing cameras, such as Kinect, as they move through space and perform high quality 3D surface reconstructions for interaction. Other collaborators (missing from the list below): Richard Newcombe (Imperial College London); David Kim (Newcastle University & Microsoft Research); Andy Davison (Imperial College London)    

Human Pose Estimation for Kinect

Established: January 25, 2011

Kinect for Xbox 360 and Windows makes you the controller by fusing 3D imaging hardware with markerless human-motion capture software. Our group investigates such software. Mixing computer vision, graphics, and machine learning techniques, we look at how to build algorithms that can learn to recognize human poses quickly and reliably. Images Traditional RGB image Image from new depth sensing camera Body parts inferred by our recognition algorithm 3D body part position proposals Related Press Binary…

Image Understanding

Established: January 1, 2000

At Microsoft Research in Cambridge we are developing new machine vision algorithms for automatic recognition and segmentation of many different object categories. We are interested in both the supervised and unsupervised scenarios.   Research data Download labelled image databases for supervised learning in the "Downloads" link below. The data provided here may be used freely for research purposes but it cannot be used for commercial purposes. Database of thousands of weakly labelled, high-res images. Pixel-wise labelled…


















(MSR Labs) B.Pixel-wise labelled image database v1

October 2016

A database of thousands of weakly labelled, high-res images used as part of a Microsoft Research project to automate image recognition.

    Click the icon to access this download

  • Website

(MSR Labs) B.Pixel-wise labelled image database v2

October 2016

A database of thousands of weakly labelled, high-res images used as part of a Microsoft Research project to automate image recognition.

    Click the icon to access this download

  • Website



  • Tutorial on Decision Forests and Fields as presented at ICCV 2013.
  • 7-Scenes RGB-D camera relocalization dataset now available.
  • Decision Forests book including tutorial and software available here.

Former Interns