TechFest Focus: Natural User Interfaces
By Douglas Gantenbein | March 8, 2011 9:00 AM PT
For many people, using a computer still means using a keyboard and a mouse. But computers are becoming more like “us”—better able to anticipate human needs, work with human preferences, even work on our behalf.
Computers, in short, are moving rapidly toward widespread adoption of natural user interfaces (NUIs)—interfaces that are more intuitive, that are easier to use, and that adapt to human habits and wishes, rather than forcing humans to adapt to computers. Microsoft has been a driving force behind the adoption of NUI technology. The wildly successful Kinect for Xbox 360 device—launched in November 2010—is a perfect example. It recognizes users, needs no controller to work, and understands what the user wants to do.
It won’t be long before more and more devices work in similar fashion. Microsoft Research is working closely with Microsoft business units to develop new products that take advantage of NUI technology. In the months and years to come, a growing number of Microsoft products will recognize voices and gestures, read facial expressions, and make computing easier, more intuitive, and more productive.
TechFest 2011, Microsoft Research’s annual showcase of forward-looking computer-science technology, will feature several projects that show how the move toward NUIs is progressing. On March 9 and 10, thousands of Microsoft employees will have a chance to view the research on display, talk with the researchers involved, and seek ways to incorporate that work into new products that could be used by millions of people worldwide.
Not all the TechFest projects are NUI-related, of course. Microsoft Research investigates the possibilities in dozens of computer-science areas. But quite a few of the demos to be shown do shine a light on natural user interfaces, and each points to a new way to see or interact with the world. One demo shows how patients’ medical images can be interpreted automatically, enhancing considerably the efficiency of a physician’s work. One literally creates a new world—instantly converting real objects into digital 3-D objects that can be manipulated by a real human hand. A third acts as a virtual drawing coach to would-be artists. And yet another enables a simple digital stylus to understand whether a person wants to draw with it, paint with it, or, perhaps, even play it like a saxophone.
Semantic Understanding of Medical Images
Healthcare professionals today are overwhelmed with the amount of medical imagery. X-rays, MRIs, CT, ultrasound, PET scans—all are growing more common as diagnostic tools.
But the sheer volume of these images also makes it more difficult to read and understand them in a timely fashion. To help make medical images easier to read and analyze, a team from Microsoft Research Cambridge has created InnerEye, a research project that uses the latest machine-learning techniques to speed image interpretation and improve diagnostic accuracy. InnerEye also has implications for improved treatments, such as enabling radiation oncologists to target treatment to tumors more precisely in sensitive areas such as the brain.
In the case of radiation therapy, it can take hours for a radiation oncologist to outline the edge of tumors and healthy organs to be protected. InnerEye—developed by researcher Antonio Criminisi and a team of colleagues that included Andrew Blake, Ender Konukoglu, Ben Glocker, Abigail Sellen, Toby Sharp, and Jamie Shotton—greatly reduces the time needed to delineate accurately the boundaries of anatomical structures of interest in 3-D.
To use InnerEye, a radiologist or clinician uses a computer pointer on a screen image of a medical scan to highlight a part of the body that requires treatment. InnerEye then employs algorithms developed by Criminisi and his colleagues to accurately define the 3-D surface of the selected organ. In the resulting image, the highlighted organ—a kidney, for instance, or even a complete aorta—seems to almost leap from the rest of the image. The organ delineation offers a quick way of assessing things such as organ volume, tissue density, and other information that aids diagnosis.
InnerEye also enables extremely fast, intuitive visual navigation and inspection of 3-D images. A physician can navigate to an optimized view of the heart simply by clicking on the word “heart,” because the system already knows where each organ is. This yields considerable time savings, with big economic implications.
The InnerEye project team also is investigating the use of Kinect in the operating theater. Surgeons often wish to view a patient’s previously acquired CT or MR scans, but touching a mouse or keyboard could introduce germs. The InnerEye technology and Kinect help by automatically interpreting the surgeon’s hand gestures. This enables the surgeon to navigate naturally through the patient’s images.
InnerEye has numerous potential applications in health care. Its automatic image analysis promises to make the work of surgeons, radiologists, and clinicians much more efficient—and, possibly, more accurate. In cancer treatment, InnerEye could be used to evaluate a tumor quickly and compare it in size and shape with earlier images. The technology also could be used to help assess the number and location of brain lesions caused by multiple sclerosis.
Blurring the Line Between the Real and the Virtual
Breaking down the barrier between the real world and the virtual world is a staple of science fiction—Avatar and The Matrix are but two recent examples. But technology is coming closer to actually blurring the line.
Microsoft Research Redmond researcher Hrvoje Benko and senior researcher Andy Wilson have taken a step toward making the virtual real with a project called MirageBlocks. Its aim is to simplify the process of digitally capturing images of everyday objects and to convert them instantaneously to 3-D images. The goal is to create a virtual mirror of the physical world, one so readily understood that a MirageBlocks user could take an image of a brick and use it to create a virtual castle—brick by brick.
Capturing and visualizing objects in 3-D long has fascinated scientists, but new technology makes it more feasible. In particular, Kinect for Xbox 360 gave Benko and Wilson—and intern Ricardo Jota—an easy-to-use, $150 gadget that easily could capture the depth of an object with its multicamera design. Coupled with new-generation 3-D projectors and 3-D glasses, Kinect helps make MirageBlocks perhaps the most advanced tool ever for capturing and manipulating 3-D imagery.
The MirageBlocks environment consists of a Kinect device, an Acer H5360 3-D projector, and Nvidia 3D Vision glasses synchronized to the projector’s frame rate. The Kinect captures the object image and tracks the user’s head position so that the virtual image is shown to the user with the correct perspective.
Users enter MirageBlocks’ virtual world by placing an object on a table top, where it is captured by the Kinect’s cameras. The object is instantly digitized and projected back into the workspace as a 3-D virtual image. The user then can move or rotate the virtual object using an actual hand or a numbered keypad. A user can take duplicate objects, or different objects, to construct a virtual 3-D model. To the user, the virtual objects have the same depth and size as their physical counterparts.
MirageBlocks has several real-world applications. It could apply an entirely new dimension to simulation games, enabling game players to create custom models or devices from a few digitized pieces or to digitize any object and place it in a virtual game. MirageBlocks’ technology could change online shopping, enabling the projection of 3-D representations of an object. It could transform teleconferencing, enabling participants to examine and manipulate 3-D representations of products or prototypes. It might even be useful in health care—an emergency-room physician, for instance, could use a 3-D image of a limb with a broken bone to correctly align the break.
Giving the Artistically Challenged a Helping Hand
It’s fair to say that most people cannot draw well. But what if a computer could help by suggesting to the would-be artist certain lines to follow or shapes to create? That’s the idea behind ShadowDraw, created by Larry Zitnick—who works as a researcher in the Interactive Visual Media Group at Microsoft Research Redmond—and principal researcher Michael Cohen, with help from intern Yong Jae Lee from the University of Texas at Austin.
In concept, ShadowDraw seems disarmingly simple. A user begins drawing an object—a bicycle, for instance, or a face—using a stylus-based Cintiq 21UX tablet. As the drawing progresses, ShadowDraw surmises the subject of the emerging drawing and begins to suggest refinements by generating a “shadow” behind the would-be artist’s lines that resembles the drawn object. By taking advantage of ShadowDraw’s suggestions, the user can create a more refined drawing than otherwise possible, while retaining the individuality of their pencil strokes and overall technique.
The seeming simplicity of ShadowDraw, though, belies the substantial computing power being harnessed behind the screen. ShadowDraw is, at its heart, a database of 30,000 images culled from the Internet and other public sources. Edges are extracted from these original photographic images to provide stroke suggestions to the user.
The main component created by the Microsoft Research team is an interactive drawing system that reacts to the user’s pencil work in real time. ShadowDraw uses a novel, partial-matching approach that finds possible matches between different sub-sections of the user’s drawing and the database of edge images. Think of ShadowDraw’s behind-the-screen interface as a checkerboard—each square where a user draws a line will generate its own set of possible matches that cumulatively vote on suggestions to help refine a user’s work. The researchers also created a novel method for spatially blending the various stroke suggestions for the drawing.
To test ShadowDraw, Zitnick and his co-researchers enlisted eight men and eight women. Each was asked to draw five subjects—a shoe, a bicycle, a butterfly, a face, and a rabbit—with and without ShadowDraw. The rabbit image was a control—there were no rabbits in the database. When using ShadowDraw, the subjects were told they could use the suggested renderings or ignore them. And each subject was given 30 minutes to complete 10 drawings.
A panel of eight additional subjects judged the drawings on a scale of one to five, with one representing “poor” and five “good.” The panelists found that ShadowDraw was of significant help to people with average drawing skills—their drawings were significantly improved by ShadowDraw. Interestingly, the subjects rated as having poor or good drawing skills, pre-ShadowDraw, saw little improvement. Zitnick says the poor artists were so bad that ShadowDraw couldn’t even guess what they were attempting to draw. The good artists already had sufficient skills to draw the test objects accurately.
Enabling One Pen to Simulate Many
Human beings have developed dozens of ways to render images on a piece of paper, a canvas, or another drawing surface. Pens, pencils, paintbrushes, crayons, and more—all can be used to create images or the written word.
Each, however, is held in a slightly different way. That can seem natural when using the device itself—people learn to manage a paintbrush in a way different from how they use a pen or a pencil. But those differences can present a challenge when attempting to work with a computer. A single digital stylus or pen can serve many functions, but to do so typically requires the user to hold the stylus in the same manner, regardless of the tool the stylus is mimicking.
A Microsoft Research team aimed to find a better way to design a computer stylus. The team—which included researcher Xiang Cao in the Human-Computer Interaction Group at Microsoft Research Asia; Shahram Izadi of Microsoft Research Cambridge; Benko and Ken Hinckleyof Microsoft Research Redmond; Minghi Sun, a Microsoft Research Cambridge intern; Hyunyoung Song of the University of Maryland; and François Guimbretière of Cornell University—asked the question: How can a digital pen or stylus be as natural to use as the varied physical tools people employ? The solution, to be shown as part of a demo called Recognizing Pen Grips for Natural UI: A digital pen enhanced with a capacitive, multitouch sensor that knows where the user’s hand touches the pen and an orientation sensor that knows at what angle the pen is held.
With that information, the digital pen can recognize different grips and automatically behave like the desired tool. If a user holds the digital pen like a paintbrush, the pen automatically behaves like a paintbrush. Hold it like a pen, it behaves like a pen–with no need to manually turn a switch on the device or choose a different stylus mode.
The implications of the technology are considerable. Musical instruments such as flutes or saxophones and many other objects all build on similar shapes. A digital stylus with grip and orientation sensors conceivably could duplicate all, while enabling the user to hold the stylus in the manner that is most natural. Even game controllers could be adapted to modify their behavior depending on how they are held, whether as a driving device for auto-based games or as a weapon in games such as Halo.