By Janie Chang, Writer, Microsoft Research
From Oct. 7 to 10 in Cambridge, Mass., Microsoft researchers attending UIST 2012—the 25th Association for Computing Machinery Symposium on User Interface Software and Technology—will be sharing projects and ideas with an international gathering of scientists and practitioners focused on human-computer interaction (HCI).
The event is a key forum for HCI software and technology, providing an opportunity for researchers to share and learn about the latest advances in the field, from traditional and web interfaces to wearable computing, virtual and augmented reality, and computer-supported collaboration.
Researchers at Microsoft traditionally are active contributors to UIST, and this year is no exception, with 10 of the 62 technical papers being presented having been written by researchers from Microsoft. Microsoft Research staff members also support the event by serving on the 2012 program committee, helping to organize individual programs, chairing various sessions, and hosting the annual Women of UIST Luncheon.
Microsoft Research’s work in HCI helps the company achieve its long-term vision of creating intuitive interfaces that not only revolutionize interactions between humans and computers, but that also empower people from of all walks of life. Digits is one of several research projects presented during UIST 2012 that help further this vision.
The Gloves Come Off
Digits: Freehand 3D Interactions Anywhere Using a Wrist-Worn Gloveless Sensor describes technology that recovers the full 3-D pose of the user’s hand. The paper—co-authored by David Kim, a Microsoft Research Ph.D. Fellow from Newcastle University’s Culture Lab; Otmar Hilliges, Shahram Izadi, Alex Butler, and Jiawen Chen of Microsoft Research’s U.K.-based Cambridge lab; Iason Oikonomidis of Greece’s Foundation for Research & Technology; and Patrick Olivier of Newcastle University’s Culture Lab—describes Digits, a wrist-worn sensor for freehand 3-D interactions on the move. By instrumenting only the wrist, the user’s entire hand is left to interact freely without “data gloves,” input devices worn as gloves that are most often used in virtual-reality applications to facilitate tactile sensing and fine-motion control. The Digits prototype, whose electronics are self-contained on the user’s wrist, optically image the entirety of the user’s hand, enabling freehand interactions in a mobile setting.
“The Digits sensor doesn’t rely on external infrastructure,” Kim explains, “which means users are not bound to a fixed space. They can interact while moving from room to room or running down the street. This finally takes 3-D interaction outside the living room.”
Mobility always has been one of the research team’s goals. To enable ubiquitous 3-D spatial interaction anywhere, Digits had to be lightweight, consume little power, and have the potential to be as small and comfortable as a watch. At the same time, Digits had to deliver superior gesture sensing and “understand” the human hand, from wrist orientation to the angle of each finger joint, so that interaction would not be limited to 3-D points in space. Digits had to understand what the hand is trying to express—even while inside a pocket.
It was Kinect for Windows that first put Izadi’s team on the path to Digits. The team was intrigued by the possibilities for enabling natural 3-D interactions with bare hands, but with as much flexibility and accuracy as data gloves. They soon realized that hand tracking could be even more mobile and detailed.
“We decided to look for more direct ways to sense 3-D data on the hand,” Izadi recalls. “From a previous project, we knew that a simple laser line generator and camera can be used to measure distances within a specific area. We also knew that there were products on the market that could act as a cheap wireless infrared camera with the ability to sense the laser projection. The signal we got back from one such camera was very weak, but it was enough to convince us to pursue working on a lightweight and mobile hand tracker.”
The researchers began with an infrared (IR) camera and an IR laser line generator, but they soon found themselves adding new data sources to enable even richer interaction possibilities. At the moment, the Digits prototype is built entirely from off-the-shelf hardware and is rather bulky: An infrared (IR) camera, an IR laser line generator, an IR diffuse illuminator, and an inertial-measurement unit (IMU) all track hand movements.
“These components enable us to quickly test and verify different ideas, configurations, and form factors without having to worry about the engineering side,” Kim says. “We need this kind of flexibility to fail and improve early and to quickly iterate the design. Ultimately, we would like to reduce Digits to the size of a watch that can be worn all the time. We want users to be able to interact spontaneously with their electronic devices using simple gestures and not even have to reach for their devices.”
It’s All About the Human Hand
One of the project’s main contributions is a real-time signal-processing pipeline that robustly samples key parts of the hand, such as the tips and lower regions of each finger. Other important research achievements are two kinematic models that enable full reconstruction of hand poses from just five key points. The project posed many challenges, but the team agrees that the hardest was extrapolating natural-looking hand motions from a sparse sampling of the key points sensed by the camera.
“We had to understand our own body parts first before we could formulate their workings mathematically,” Kim says. “We spent hours just staring at our fingers. We read dozens of scientific papers about the biomechanical properties of the human hand. We tried to correlate these five points with the highly complex motion of the hand. In fact, we completely rewrote each kinematic model about three or four times until we got it just right.”
The team agrees that the most exciting moment of the project came when team members saw the models succeed.
“At the beginning, the virtual hand often broke and collapsed. It was always very painful to watch,” Kim recalls. “Then, one day, we radically simplified the mathematical model, and suddenly, it behaved like a human hand. It felt absolutely surreal and immersive, like in the movie Avatar. That moment gave us a big boost!”
Digits is meant to be a general-purpose interaction platform, and to prove the utility of the technology, both the Digits technical paper being presented during UIST 2012 and an accompanying video present interactive scenarios using Digits in a variety of applications, with particular emphasis on mobile scenarios, where it can interact with mobile phones and tablets. The researchers also experimented with eyes-free interfaces, which enable users to leave mobile devices in a pocket or purse and interact with them using hand gestures. Another exciting application area for Digits is in gaming. Currently, Kinect for Windows and commercial game consoles do not support finger tracking. Digits could be complementary to these existing sensing modalities; one option could be to combine Kinect’s full-body tracker with Digits’ high–fidelity freehand interaction.
At present, because of the technical challenges in sensing a full 3-D hand pose, most systems constrain the problem by limiting hand tracking to 2-D input only or by supporting interaction through surfaces and other tangible mediators.
“By understanding how one part of the body works and knowing what sensors to use to capture a snapshot,” Kim says, “Digits offers a compelling look at the possibilities of opening up the full expressiveness and dexterity of one of our body parts for mobile human-computer interaction.”
Microsoft Research Contributions to UIST 2012
Cliplets: Juxtaposing Still and Dynamic Imagery
Neel Joshi, Microsoft Research Redmond; Sisil Mehta, Georgia Institute of Technology; Steven Drucker, Microsoft Research Redmond; Eric Stollnitz, Microsoft Research Redmond; Hugues Hoppe, Microsoft Research Redmond; Matt Uyttendaele, Microsoft Research Redmond; and Michael Cohen, Microsoft Research Redmond.
Cross-Device Interaction via Micro-mobility and F-formations
Nicolai Marquardt, Microsoft Research Redmond and University of Calgary; Ken Hinckley, Microsoft Research Redmond; and Saul Greenberg, University of Calgary.
DejaVu: Integrated Support for Developing Interactive Camera-Based Programs
Jun Kato, Microsoft Research Asia; Sean McDirmid, Microsoft Research Asia; and Xiang Cao, Microsoft Research Asia.
Designing for Low-Latency Direct-Touch Input
Albert Ng, Stanford University; Julian Lepinski, University of Toronto; Daniel Wigdor, University of Toronto; Steven Sanders, Sanders Capital; and Paul Dietz, Microsoft.
Digits: Freehand 3D Interactions Anywhere Using a Wrist-Worn Gloveless Sensor
David Kim, Microsoft Research Cambridge and Newcastle University; Otmar Hilliges, Microsoft Research Cambridge; Shahram Izadi, Microsoft Research Cambridge; D. Alex Butler, Microsoft Research Cambridge; Jiawen Chen, Microsoft Research Cambridge; Iason Oikonomidis, Foundation for Research & Technology; and Patrick Olivier, Newcastle University.
DuploTrack: A Real-Time System for Authoring and Guiding Duplo Block Assembly
Ankit Gupta, University of Washington; Dieter Fox, University of Washington; Brian Curless, University of Washington; and Michael Cohen, Microsoft Research Redmond.
KinÊtre: Animating the World with the Human Body
Jiawen Chen, Microsoft Research Cambridge; Shahram Izadi, Microsoft Research Cambridge; and Andrew Fitzgibbon, Microsoft Research Cambridge.
Low-cost Audience Polling Using Computer Vision
Andrew Cross, Microsoft Research India; Edward Cutrell, Microsoft Research India; and William Thies, Microsoft Research India.
PICOntrol: Using a Handheld Projector for Direct Control of Physical Devices through Visible Light
Dominik Schmidt, Microsoft Research Asia and Lancaster University; David Molyneaux, Microsoft Research Cambridge and Lancaster University; and Xiang Cao, Microsoft Research Asia.
Steerable Augmented Reality with the Beamatron
Andy Wilson, Microsoft Research Redmond; Hrvoje Benko, Microsoft Research Redmond; Shahram Izadi, Microsoft Research Cambridge; and Otmar Hilliges, Microsoft Research Cambridge.