Accurate, Robust, and Flexible Real-time Hand Tracking

CHI '15 Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems |

Published by ACM

Best of CHI Honorable Mention Award

View Publication | View Publication | View Publication | View Publication

We present a new real-time hand tracking system based on a single depth camera. The system can accurately reconstruct complex hand poses across a variety of subjects. It also allows for robust tracking, rapidly recovering from any temporary failures. Most uniquely, our tracker is highly flexible, dramatically improving upon previous approaches which have focused on front-facing close-range scenarios. This flexibility opens up new possibilities for human-computer interaction with examples including tracking at distances from tens of centimeters through to several meters (for controlling the TV at a distance), supporting tracking using a moving depth camera (for mobile scenarios), and arbitrary camera placements (for VR headsets). These features are achieved through a new pipeline that combines a multi-layered discriminative reinitialization strategy for per-frame pose estimation, followed by a generative model-fitting stage. We provide extensive technical details and a detailed qualitative and quantitative analysis.

Publication Downloads

FingerPaint Dataset

August 28, 2015

The FingerPaint Dataset contains video-sequences of several individuals performing hand gestures, as captured by a depth camera. The ground truth locations of the fingertips are included as an annotation for each frame of the video. This dataset was developed for the paper: T. Sharp et al. "Accurate, Robust, and Flexible Real-time Hand Tracking." In Proc. CHI, vol. 8. 2015.  Users of the dataset are requested to cite this paper. Note: The hand poses shown in these images were chosen at random and are not intended to convey any meaning or cause any offence.

Download Data

Accurate, Robust, and Flexible Real-time Hand Tracking

We present a new real-time hand tracking system based on a single depth camera. The system can accurately reconstruct complex hand poses across a variety of subjects. It also allows for robust tracking, rapidly recovering from any temporary failures. Most uniquely, our tracker is highly flexible, dramatically improving upon previous approaches which have focused on front-facing close-range scenarios. This flexibility opens up new possibilities for human-computer interaction with examples including tracking at distances from tens of centimeters through to several meters (for controlling the TV at a distance), supporting tracking using a moving depth camera (for mobile scenarios), and arbitrary camera placements (for VR headsets). These features are achieved through a new pipeline that combines a multi-layered discriminative reinitialization strategy for per-frame pose estimation, followed by a generative model-fitting stage. We provide extensive technical details and a detailed qualitative and quantitative analysis.