Surface Hub + Kinect: Interaction Beyond Touch and Collaboration Beyond Video Chat


December 7, 2015


Zhengyou Zhang




Large displays are becoming commodity, and more and more, they are touch-enabled. In this talk, we describe a system called ViiBoard, Vision-enhanced Immersive Interaction with Touch Board, by adding a Kinect (depth sensor) next to Surface Hub (large touch display). It consists of two parts.

The first part is called VTouch that augments touch input with visual understanding of the user to improve interaction with a large touch-sensitive display such as Microsoft Surface Hub. A commodity color plus depth sensor such as Microsoft Kinect adds the visual modality and enables new interactions beyond touch. Through visual analysis, the system understands where the user is, who the user is, and what the user is doing even before the user touches the display. Such information is used to enhance interaction in multiple ways. For example, a user can use simple gestures to bring up menu items such as color palette and soft keyboard; menu items can be shown where the user is and can follow the user; hovering can show information to the user before the user commits to touch; the user can perform different functions (for example writing and erasing) with different hands; and the user’s preference profile can be maintained, distinct from other users. User studies are conducted and the users very much appreciate the value of these and other enhanced interactions.

The second part is called ImmerseBoard. ImmerseBoard is a system for remote collaboration through a digital whiteboard that gives participants a 3D immersive experience, enabled only by an RGBD camera mounted on the side of a large touch display. Using 3D processing of the depth images, life-sized rendering, and novel visualizations, ImmerseBoard emulates writing side-by-side on a physical whiteboard, or alternatively on a mirror. User studies involving three tasks show that compared to standard video conferencing with a digital whiteboard, ImmerseBoard provides participants with a quantitatively better ability to estimate their remote partners’ eye gaze direction, gesture direction, intention, and level of agreement. Moreover, these quantitative capabilities translate qualitatively into a heightened sense of being together and a more enjoyable experience. ImmerseBoard’s form factor is suitable for practical and easy installation in homes and offices.


Zhengyou Zhang

Zhengyou Zhang is a Principal Researcher with Microsoft Research, Redmond, WA, USA, and the Research Manager of the “Multimedia, Interaction, and Experiences” group. In 1990-1998, he was a Senior Research Scientist with INRIA, France. During 1996-1997, he spent a one-year sabbatical as an Invited Researcher with ATR, Kyoto, Japan. He has published over 200 papers in refereed international journals and conferences, and has coauthored five books. He holds more than 100 US patents and has about 20 patents pending. He also holds a few Japanese patents for his inventions during his sabbatical at ATR.

Dr. Zhang is an IEEE Fellow and an ACM Fellow. He is the Founding Editor-in-Chief of the IEEE Transactions on Autonomous Mental Development, and has served on the editorial board of IEEE TPAMI, IEEE TCSVT, IEEE TMM, IJCV, IJPRAI, MVA, among others. He has served as a program chair, a general chair, and a program committee member for numerous international conferences in the areas of computer vision, audio and speech signal processing, multimedia, human-computer interaction, and autonomous mental development. He is serving as a General Chair of International Conference on Multimodal Interaction (ICMI) 2015, and a General Chair of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017. He received the IEEE Helmholtz Test of Time Award at ICCV 2013 for his paper published in 1999 on camera calibration, now known as Zhang’s method.