The included source code provides a simple slide-show application, which processes the gesture recognition events from the runtime DLL to navigate through a series of images drawn from the ‘Pictures’ folders on your machine. Included in the sample is the ability to track and display multiple people, with the nearest person (shown in black) controlling the slide-show. Those not in control will be shown in gray. A person successfully completing a gesture will temporarily show as red:
The sample was developed by our team, and uses research from the Machine Learning and Perception group to provide a Random Decision Forest (RDF) based gesture recognizer trained using machine learning algorithms designed by Sebastian Nowozin.
This sample was distributed beginning 21st May 2012 in the Kinect for Windows Developer Toolkit as a Visual Studio 2010 C# based solution, which includes the following:
- A runtime DLL which captures real-time Kinect information, processes it through an advanced gesture recognition library, and triggers gesture events.
- A C# source code sample showing how this runtime can be used to handle the events generated by actual gestures.
A short background to the research showcased in this sample.
Random Decision Forests and Gesture Recognition
Gesture recognition enables natural interaction with computer systems. For many applications such as games the recognition of gestures needs to be performed in real-time and with low recognition latencies. In this demo, we use a machine learning method to learn characteristic patterns of gestures from annotated training data. In particular, we use the skeletal joint information provided by the Kinect SDK to extract features describing movements of distinct parts of the body. The machine learning system based on random decision forests interprets these features and decides for the presence or absence of each gesture to be recognized. This process is computationally efficient and scalable to large gesture vocabularies, requiring merely the collection and annotation of additional training data.
Case Study: Building a Gesture Recognizer
This application focuses on two gestures, one made using each arm, and mapping them to events for navigating left and right. Careful definition of these gestures was required to avoid some of the ambiguity possible due to overlap of features across gestures, requiring 2 business days of development. For the original recognizer, 3 people volunteered to be recorded doing both the target gestures and also providing “noise”. Each session took 20 minutes, and during that time volunteers did each gesture 3 times. Afterwards, the recordings were carefully tagged to identify when the target gestures occurred, in a process that took about half an hour per session. Processing of the annotated recordings took about one and a half hours, creating the recognizer containing the RDF decision trees that is used in the sample. On analyzing the generated gesture recognizer, it was decided further recording sessions were needed. Another 12 people took part in a second process, which took a full day, to record and annotate their gestures. Finally we spent several days testing the generated recognizer with further volunteers to ensure it performed well for users who were not in the original training set.