Abstract

We study the problem of action recognition from depth sequences
captured by depth cameras, where noise and occlusion are common
problems because they are captured with a single commodity camera.
In order to deal with these issues, we extract semi-local features
called random occupancy pattern (ROP) features, which employ a novel
sampling scheme that effectively explores an extremely large sampling
space. We also utilize a sparse coding approach to robustly encode these
features. The proposed approach does not require careful parameter tuning.
Its training is very fast due to the use of the high-dimensional integral
image, and it is robust to the occlusions. Our technique is evaluated on
two datasets captured by commodity depth cameras: an action dataset
and a hand gesture dataset. Our classification results are superior to
those obtained by the state of the art approaches on both datasets.