Random decision tree classification is used in a variety of applications, from speech recognition to Web search engines. Decision trees are used in the Microsoft Kinect vision pipeline to recognize human body parts and gestures for a more natural computer-user interface. Tree-based classification can be taxing, both in terms of computational load and memory bandwidth. This makes highly-optimized hardware implementations attractive, particularly given the strict power and form factor limitations of embedded or mobile platforms. In this paper we present a complete architecture that interfaces the Kinect depth-image sensor to an FPGA-based implementation of the Forest Fire pixel classification algorithm. Key performance parameters, algorithmic improvements and design trade-off are discussed.