PowerPoint plugin prototype that adds Speech and Gesture recognition for navigating slides and animation during a presentation.
Presenter performs one or more rehearsals while being recorded by the Kinect. Gestures / body movements are automatically extracted using an energy function. Presenter can then optionally choose gestures to map to specific PowerPoint slide navigation commands.
Speech recognition is handled separately but similarly — after the accompanying talking points are decided the presenter can choose phrases that should map to specific slides.
These spoken phrases and gestures are part of the content delivery of the presentation, so the presenter can focus on loosely following his/her script without simultaneously doing the busy work of managing the navigation of the slide.
Findings:
- Speech recognition accuracy must be close to 100%, as having to repeat a phrase breaks the flow of the presentation. Speech recognition + gesture together is more robust, but may still be insufficient.
- Wearing a noise-cancelling microphone (as opposed to podium mounted) improves speech recognition accuracy.
- Depending on the presenter, gestures may be relatively featureless, resulting in less accurate recognition models. Dynamic presenters with broad and varied gestures can be recognized more accurately.
- The necessity of setting-up a Kinect in the conference room and ensuring that it has full view of the presenter at the appropriate distance hampers the usefulness of this tool. The value of the gesture recognition seems marginal in this scenario, so it may be dropped in future iterations.
- Overall recognition accuracy is not yet good enough for real-world deployment.,Since presenters think slide management is not a big pain point they will not adopt this until it is trivial to set up and have near perfect recognition.
People
Ambrosio Blanco
Principal Software Engineering Manager