Efficient and Precise Interactive Hand Tracking through Joint, Continuous Optimization of Pose and Correspondences

Jonathan Taylor; Lucas Bordeaux; Tom Cashman; Bob Corish; Cem Keskin; Eduardo Soto; David Sweeney; Julien Valentin; Benjamin Luff; Arran Topalian; Erroll Wood; Sameh Khamis; Pushmeet Kohli; Toby Sharp; Shahram Izadi; Richard Banks; Andrew Fitzgibbon; Jamie Shotton

Efficient and Precise Interactive Hand Tracking through Joint, Continuous Optimization of Pose and Correspondences

Jonathan Taylor ,
Lucas Bordeaux ,
Tom Cashman ,
Bob Corish ,
Cem Keskin ,
Eduardo Soto ,
David Sweeney ,
Julien Valentin ,
Benjamin Luff ,
Arran Topalian ,
Erroll Wood ,
Sameh Khamis ,
Pushmeet Kohli ,
Toby Sharp ,
Shahram Izadi ,
Richard Banks ,
Andrew Fitzgibbon ,
Jamie Shotton

ACM Transactions on Graphics (TOG) - Proceedings of ACM SIGGRAPH 2016 | July 2016 , Vol 35

Download BibTex

Fully articulated hand tracking promises to enable fundamentally new interactions with virtual and augmented worlds, but the limited accuracy and efﬁciency of current systems has prevented widespread adoption. Today’s dominant paradigm uses machine learning for initialization and recovery followed by iterative model-ﬁtting optimization to achieve a detailed pose ﬁt. We follow this paradigm, but make several changes to the model-ﬁtting, namely using: (1) a more discriminative objective function; (2) a smooth-surface model that provides gradients for non-linear optimization; and (3) joint optimization over both the model pose and the correspondences between observed data points and the model surface. While each of these changes may actually increase the cost per ﬁtting iteration, we ﬁnd a compensating decrease in the number of iterations. Further, the wide basin of convergence means that fewer starting points are needed for successful model ﬁtting. Our system runs in real-time on CPU only, which frees up the commonly over-burdened GPU for experience designers. The hand tracker is efﬁcient enough to run on low-power devices such as tablets. We can track up to several meters from the camera to provide a large working volume for interaction, even using the noisy data from current-generation depth cameras. Quantitative assessments on standard datasets show that the new approach exceeds the state of the art in accuracy. Qualitative results take the form of live recordings of a range of interactive experiences enabled by this new approach.

Efficient and Precise Interactive Hand Tracking through Joint, Continuous Optimization of Pose and Correspondences

Fully articulated hand tracking promises to enable fundamentally new interactions with virtual and augmented worlds, but the limited accuracy and efﬁciency of current systems has prevented widespread adoption. Today’s dominant paradigm uses machine learning for initialization and recovery followed by iterative model-ﬁtting optimization to achieve a detailed pose ﬁt. We follow this paradigm, but make several changes to the model-ﬁtting, namely using: (1) a more discriminative objective function; (2) a smooth-surface model that provides gradients for non-linear optimization; and (3) joint optimization over both the model pose and the correspondences between observed data points and the model surface. While each of these changes may actually increase the cost per ﬁtting iteration, we ﬁnd a compensating decrease in the number of iterations. Further, the wide basin of convergence means that fewer starting points are needed for successful model ﬁtting. Our system runs in real-time on CPU only, which frees up the commonly over-burdened GPU for experience designers. The hand tracker is efﬁcient enough to run on low-power devices such as tablets. We can track up to several meters from the camera to provide a large working volume for interaction, even using the noisy data from current-generation depth cameras. Quantitative assessments on standard datasets show that the new approach exceeds the state of the art in accuracy. Qualitative results take the form of live recordings of a range of interactive experiences enabled by this new approach.