Body Part Recognition and the Development of Kinect


July 16, 2014


Jamie Shotton


Late 2010, Microsoft launched Xbox Kinect, an amazing new depth-sensing camera and a revolution in gaming where your body movements allow you to control the game. In this talk I’ll present a behind-the-scenes look at the development of Kinect, focusing on the depth camera, the challenges of human pose estimation, and the body part recognition algorithm that drives Kinect’s skeletal tracking pipeline. Body part recognition uses machine learning to efficiently produce an interpretation of pixels coming from the Kinect camera into different parts of the body: head, left hand, right knee, etc. The approach was designed to be robust: firstly, the system is trained with a vast and highly varied training set of synthetic images to ensure the system works for all ages, body shapes & sizes, clothing and hair styles; and secondly, the recognition does not rely on any temporal information, allowing the system to initialize from arbitrary poses and preventing catastrophic loss of track.


Jamie Shotton

Jamie Shotton studied Computer Science at the University of Cambridge, and remained at Cambridge for his PhD in Computer Vision and Visual Object Recognition, graduating in 2007. He was awarded the Toshiba Fellowship and travelled to Japan to continue his research at the Toshiba Corporate Research & Development Center in Kawasaki. In 2008 he returned to the UK and started work at Microsoft Research Cambridge in the Machine Learning & Perception group.

His research interests include Object Recognition, Machine Learning, Human Pose Estimation, Gesture and Action Recognition, and Medical Imaging. He has published papers in all the major computer vision conferences and journals, with a focus on object detection by modelling contours, semantic scene segmentation exploiting both appearance and semantic context, and dense object part layout constraints. His demo on real-time semantic scene segmentation won the best demo award at CVPR 2008. More recently, he has investigated how many of the ideas from visual object recognition and machine learning can be adapted to new application areas. In human pose estimation, he architected the human body part recognition algorithm that drives Xbox Kinect’s skeletal tracking algorithm. In the sphere of medical imaging, he has published papers on the automatic detection of organs and other anatomical structures from CT data, with a view to simplifying and speeding up the radiologist’s workflow.

More information is available here: