We publish a subset of the data from the paper “Discriminative Ferns Ensemble for Hand Pose Recognition”. To receive a download link for the dataset please send your request to ThreeHandPose@microsoft.com.
The data comprises 80,000 hand pose images of several subjects collected by the Xbox One team. Each image consists of three (3) channels each of size 36 by 36 pixels: binary mask, masked IR, masked depth. Every image shows a hand in one of the following poses: open, closed, lasso or negative (a hand holding a random object). Only the hand is visible. The data is split into two binary files, Trn.bin and Tst.bin, one containing 60,000 images and the other containing 20,000.
Each file is formatted as follows:
- First, 36×36 mask images for all instances, columns first with each pixel being a 4-byte float.
- Second, 36×36 IR images for all instances, columns first with each pixel being a 4-byte float.
- Third, 36×36 depth images for all instances, columns first with each pixel being a 4-byte float.
- Finally, labels for all instances, each label being a 4-byte integer. (1 – open, 2 – closed, 3 – lasso, 4 – negative)