TechFest 2011

Summary

TechFest is an annual event, for Microsoft employees and guests, that showcases the most exciting research from Microsoft Research’s locations around the world.  Researchers share their latest work—and the technologies emerging from those efforts.  The event provides a forum in which product teams and researchers can interact, fostering the transfer of groundbreaking technologies into Microsoft products.

We invite you to explore the projects and watch the videos.  Immerse yourself in TechFest content and see how today’s future will become tomorrow’s reality.

Feature Story

TechFest Focus: Natural User Interfaces

By Douglas Gantenbein | March 8, 2011 9:00 AM PT

For many people, using a computer still means using a keyboard and a mouse. But computers are becoming more like “us”—better able to anticipate human needs, work with human preferences, even work on our behalf.

techfest2011_nuiComputers, in short, are moving rapidly toward widespread adoption of natural user interfaces (NUIs)—interfaces that are more intuitive, that are easier to use, and that adapt to human habits and wishes, rather than forcing humans to adapt to computers. Microsoft has been a driving force behind the adoption of NUI technology. The wildly successful Kinect for Xbox 360 device—launched in November 2010—is a perfect example. It recognizes users, needs no controller to work, and understands what the user wants to do.

It won’t be long before more and more devices work in similar fashion. Microsoft Research is working closely with Microsoft business units to develop new products that take advantage of NUI technology. In the months and years to come, a growing number of Microsoft products will recognize voices and gestures, read facial expressions, and make computing easier, more intuitive, and more productive.

TechFest 2011, Microsoft Research’s annual showcase of forward-looking computer-science technology, will feature several projects that show how the move toward NUIs is progressing. On March 9 and 10, thousands of Microsoft employees will have a chance to view the research on display, talk with the researchers involved, and seek ways to incorporate that work into new products that could be used by millions of people worldwide.

Not all the TechFest projects are NUI-related, of course. Microsoft Research investigates the possibilities in dozens of computer-science areas. But quite a few of the demos to be shown do shine a light on natural user interfaces, and each points to a new way to see or interact with the world. One demo shows how patients’ medical images can be interpreted automatically, enhancing considerably the efficiency of a physician’s work. One literally creates a new world—instantly converting real objects into digital 3-D objects that can be manipulated by a real human hand. A third acts as a virtual drawing coach to would-be artists. And yet another enables a simple digital stylus to understand whether a person wants to draw with it, paint with it, or, perhaps, even play it like a saxophone.

Semantic Understanding of Medical Images

Healthcare professionals today are overwhelmed with the amount of medical imagery. X-rays, MRIs, CT, ultrasound, PET scans—all are growing more common as diagnostic tools.

carotidsBut the sheer volume of these images also makes it more difficult to read and understand them in a timely fashion. To help make medical images easier to read and analyze, a team from Microsoft Research Cambridge has created InnerEye, a research project that uses the latest machine-learning techniques to speed image interpretation and improve diagnostic accuracy. InnerEye also has implications for improved treatments, such as enabling radiation oncologists to target treatment to tumors more precisely in sensitive areas such as the brain.

In the case of radiation therapy, it can take hours for a radiation oncologist to outline the edge of tumors and healthy organs to be protected. InnerEye—developed by researcher Antonio Criminisi and a team of colleagues that included Andrew Blake, Ender Konukoglu, Ben Glocker, Abigail Sellen, Toby Sharp, and Jamie Shotton—greatly reduces the time needed to delineate accurately the boundaries of anatomical structures of interest in 3-D.

To use InnerEye, a radiologist or clinician uses a computer pointer on a screen image of a medical scan to highlight a part of the body that requires treatment. InnerEye then employs algorithms developed by Criminisi and his colleagues to accurately define the 3-D surface of the selected organ. In the resulting image, the highlighted organ—a kidney, for instance, or even a complete aorta—seems to almost leap from the rest of the image. The organ delineation offers a quick way of assessing things such as organ volume, tissue density, and other information that aids diagnosis.

InnerEye also enables extremely fast, intuitive visual navigation and inspection of 3-D images. A physician can navigate to an optimized view of the heart simply by clicking on the word “heart,” because the system already knows where each organ is. This yields considerable time savings, with big economic implications.

The InnerEye project team also is investigating the use of Kinect in the operating theater. Surgeons often wish to view a patient’s previously acquired CT or MR scans, but touching a mouse or keyboard could introduce germs. The InnerEye technology and Kinect help by automatically interpreting the surgeon’s hand gestures. This enables the surgeon to navigate naturally through the patient’s images.

InnerEye has numerous potential applications in health care. Its automatic image analysis promises to make the work of surgeons, radiologists, and clinicians much more efficient—and, possibly, more accurate. In cancer treatment, InnerEye could be used to evaluate a tumor quickly and compare it in size and shape with earlier images. The technology also could be used to help assess the number and location of brain lesions caused by multiple sclerosis.

Blurring the Line Between the Real and the Virtual

Breaking down the barrier between the real world and the virtual world is a staple of science fiction—Avatar and The Matrix are but two recent examples. But technology is coming closer to actually blurring the line.

mirage_blocksMicrosoft Research Redmond researcher Hrvoje Benko and senior researcher Andy Wilson have taken a step toward making the virtual real with a project called MirageBlocks. Its aim is to simplify the process of digitally capturing images of everyday objects and to convert them instantaneously to 3-D images. The goal is to create a virtual mirror of the physical world, one so readily understood that a MirageBlocks user could take an image of a brick and use it to create a virtual castle—brick by brick.

Capturing and visualizing objects in 3-D long has fascinated scientists, but new technology makes it more feasible. In particular, Kinect for Xbox 360 gave Benko and Wilson—and intern Ricardo Jota—an easy-to-use, $150 gadget that easily could capture the depth of an object with its multicamera design. Coupled with new-generation 3-D projectors and 3-D glasses, Kinect helps make MirageBlocks perhaps the most advanced tool ever for capturing and manipulating 3-D imagery.

The MirageBlocks environment consists of a Kinect device, an Acer H5360 3-D projector, and Nvidia 3D Vision glasses synchronized to the projector’s frame rate. The Kinect captures the object image and tracks the user’s head position so that the virtual image is shown to the user with the correct perspective.

Users enter MirageBlocks’ virtual world by placing an object on a table top, where it is captured by the Kinect’s cameras. The object is instantly digitized and projected back into the workspace as a 3-D virtual image. The user then can move or rotate the virtual object using an actual hand or a numbered keypad. A user can take duplicate objects, or different objects, to construct a virtual 3-D model. To the user, the virtual objects have the same depth and size as their physical counterparts.

MirageBlocks has several real-world applications. It could apply an entirely new dimension to simulation games, enabling game players to create custom models or devices from a few digitized pieces or to digitize any object and place it in a virtual game. MirageBlocks’ technology could change online shopping, enabling the projection of 3-D representations of an object. It could transform teleconferencing, enabling participants to examine and manipulate 3-D representations of products or prototypes. It might even be useful in health care—an emergency-room physician, for instance, could use a 3-D image of a limb with a broken bone to correctly align the break.

Giving the Artistically Challenged a Helping Hand

It’s fair to say that most people cannot draw well. But what if a computer could help by suggesting to the would-be artist certain lines to follow or shapes to create? That’s the idea behind ShadowDraw, created by Larry Zitnick—who works as a researcher in the Interactive Visual Media Group at Microsoft Research Redmond—and principal researcher Michael Cohen, with help from intern Yong Jae Lee from the University of Texas at Austin.

teasersIn concept, ShadowDraw seems disarmingly simple. A user begins drawing an object—a bicycle, for instance, or a face—using a stylus-based Cintiq 21UX tablet. As the drawing progresses, ShadowDraw surmises the subject of the emerging drawing and begins to suggest refinements by generating a “shadow” behind the would-be artist’s lines that resembles the drawn object. By taking advantage of ShadowDraw’s suggestions, the user can create a more refined drawing than otherwise possible, while retaining the individuality of their pencil strokes and overall technique.

The seeming simplicity of ShadowDraw, though, belies the substantial computing power being harnessed behind the screen. ShadowDraw is, at its heart, a database of 30,000 images culled from the Internet and other public sources. Edges are extracted from these original photographic images to provide stroke suggestions to the user.

The main component created by the Microsoft Research team is an interactive drawing system that reacts to the user’s pencil work in real time. ShadowDraw uses a novel, partial-matching approach that finds possible matches between different sub-sections of the user’s drawing and the database of edge images. Think of ShadowDraw’s behind-the-screen interface as a checkerboard—each square where a user draws a line will generate its own set of possible matches that cumulatively vote on suggestions to help refine a user’s work. The researchers also created a novel method for spatially blending the various stroke suggestions for the drawing.

To test ShadowDraw, Zitnick and his co-researchers enlisted eight men and eight women. Each was asked to draw five subjects—a shoe, a bicycle, a butterfly, a face, and a rabbit—with and without ShadowDraw. The rabbit image was a control—there were no rabbits in the database. When using ShadowDraw, the subjects were told they could use the suggested renderings or ignore them. And each subject was given 30 minutes to complete 10 drawings.

A panel of eight additional subjects judged the drawings on a scale of one to five, with one representing “poor” and five “good.” The panelists found that ShadowDraw was of significant help to people with average drawing skills—their drawings were significantly improved by ShadowDraw. Interestingly, the subjects rated as having poor or good drawing skills, pre-ShadowDraw, saw little improvement. Zitnick says the poor artists were so bad that ShadowDraw couldn’t even guess what they were attempting to draw. The good artists already had sufficient skills to draw the test objects accurately.

Enabling One Pen to Simulate Many

Human beings have developed dozens of ways to render images on a piece of paper, a canvas, or another drawing surface. Pens, pencils, paintbrushes, crayons, and more—all can be used to create images or the written word.

pen_hardwareEach, however, is held in a slightly different way. That can seem natural when using the device itself—people learn to manage a paintbrush in a way different from how they use a pen or a pencil. But those differences can present a challenge when attempting to work with a computer. A single digital stylus or pen can serve many functions, but to do so typically requires the user to hold the stylus in the same manner, regardless of the tool the stylus is mimicking.

A Microsoft Research team aimed to find a better way to design a computer stylus. The team—which included researcher Xiang Cao in the Human-Computer Interaction Group at Microsoft Research Asia; Shahram Izadi of Microsoft Research Cambridge; Benko and Ken Hinckleyof Microsoft Research Redmond; Minghi Sun, a Microsoft Research Cambridge intern; Hyunyoung Song of the University of Maryland; and François Guimbretière of Cornell University—asked the question: How can a digital pen or stylus be as natural to use as the varied physical tools people employ? The solution, to be shown as part of a demo called Recognizing Pen Grips for Natural UI: A digital pen enhanced with a capacitive, multitouch sensor that knows where the user’s hand touches the pen and an orientation sensor that knows at what angle the pen is held.

With that information, the digital pen can recognize different grips and automatically behave like the desired tool. If a user holds the digital pen like a paintbrush, the pen automatically behaves like a paintbrush. Hold it like a pen, it behaves like a pen–with no need to manually turn a switch on the device or choose a different stylus mode.

The implications of the technology are considerable. Musical instruments such as flutes or saxophones and many other objects all build on similar shapes. A digital stylus with grip and orientation sensors conceivably could duplicate all, while enabling the user to hold the stylus in the manner that is most natural. Even game controllers could be adapted to modify their behavior depending on how they are held, whether as a driving device for auto-based games or as a weapon in games such as Halo.

What is TechFest?

crowdthumbnail3.jpgThe latest thinking.  The freshest ideas.

TechFest is an annual event, for Microsoft employees and guests, that showcases the most exciting research from Microsoft Research’s locations around the world.  Researchers share their latest work—and the technologies emerging from those efforts.  The event provides a forum in which product teams and researchers can interact, fostering the transfer of groundbreaking technologies into Microsoft products.

We invite you to explore the projects, watch the videos, follow the buzz, and join the discussion on Facebook and Twitter.  Immerse yourself in TechFest content and see how today’s future will become tomorrow’s reality.

crowd2.jpg

In the News

Discover More

combinedsigns.png

Projects

3-D, Photo-Real Talking Head

Our research showcases a new, 3-D, photo-real talking head with freely controlled head motions and facial expressions. It extends our prior, high-quality, 2-D, photo-real talking head to 3-D. First, we apply a 2-D-to-3-D reconstruction algorithm frame by frame on a 2-D video to construct a 3-D training database. In training, super-feature vectors consisting of 3-D geometry, texture, and speech are formed to train a statistical, multistreamed, Hidden Markov Model (HMM). The HMM then is used to synthesize both the trajectories of geometric animation and dynamic texture. The 3-D talking head can be animated by the geometric trajectory, while the facial expressions and articulator movements are rendered with dynamic texture sequences. Head motions and facial expression also can be separately controlled by manipulating corresponding parameters. The new 3-D talking head has many useful applications, such as voice agents, telepresence, gaming, and speech-to-speech translation. Learn more…

3-D Scanning with a Regular Camera

3-D television is creating a huge buzz in the consumer space, but the generation of 3-D content remains a largely professional endeavor. Our research demonstrates an easy-to-use system for creating photorealistic, 3-D-image-based models simply by walking around an object of interest with your phone, still camera, or video camera. The objects might be your custom car or motorcycle, a wedding cake or dress, a rare musical instrument, or a handcrafted artwork. Our system uses 3-D stereo matching techniques combined with image-based modeling and rendering to create a photorealistic model you can navigate simply by spinning it around on your screen, tablet, or mobile device.

Applied Sciences Group: Smart Interactive Displays

Steerable AutoStereo 3-D Display: We use a special, flat optical lens (Wedge) behind an LCD monitor to direct a narrow beam of light into each of a viewer’s eyes. By using a Kinect head tracker, the user’s relation to the display is tracked, and thereby, the prototype is able to steer that narrow beam to the user. The combination creates a 3-D image that is steered to the viewer without the need for glasses or holding your head in place.

Steerable Multiview Display: The same optical system used in the 3-D system, Wedge behind an LCD, is used to steer two separate images to two separate people rather than two separate eyes, as in the 3-D case. Using a Kinect head tracker, we find and track multiple viewers and send each viewer his or her own unique image. Therefore, two people can be looking at the same display but see two completely different images. If the two users switch positions, the same image continuously is steered toward them.

Retro-Reflective Air-Gesture Display: Sometimes, it’s better to control with gestures than buttons. Using a retro-reflective screen and a camera close to the projector makes all objects cast a shadow, regardless of their color. This makes it easy to apply computer-vision algorithms to sense above-screen gestures that can be used for control, navigation, and many other applications.

A display that can see: Using the flat Wedge optic in camera mode behind a special, transparent organic-light-emitting-diode display, we can capture images that are both on and above the display. This enables touch and above-screen gesture interfaces, as well as telepresence applications.

Kinect based Virtual Window: Using Kinect, we track a user’s position relative to a 3-D display to create the illusion of looking through a window. This view-dependent-rendered technique is used in both the Wedge 3-D and multiview demos, but the effect is much more apparent in this demo. The user quickly should realize the need for a multiview display, because this illusion is valid for only one user with a conventional display. This technique, along with the Wedge 3-D output and 3-D input techniques we are developing, are the basic building blocks for the ultimate telepresence display. This Magic Window is a bidirectional, light-field, interactive display that gives multiple users in a telepresence session the illusion that they are interacting with and talking to each other through a simple glass window. Learn more…

Cloud Data Analytics from Excel

Excel is an established data-collection and data-analysis tool in business, technical computing, and academic research. Excel offers an attractive user interface, easy-to-use data entry, and substantial interactivity for what-if analysis. But data in Excel is not readily discoverable and, hence, does not promote data sharing. Moreover, Excel does not offer scalable computation for large-scale analytics. Increasingly, researchers encounter a deluge of data, and when working in Excel, it is not easy to invoke analytics to explore data, find related data sets, or invoke external models. Our project shows how we seamlessly integrate cloud storage and scalable analytics into Excel through a research ribbon. Any analyst can use our tool to discover and import data from the cloud, invoke cloud-scale data analytics to extract information from large data sets, invoke models, and then store data in the cloud—all through a spreadsheet with which they are already familiar. Learn more…

Controlling Home Heating with Occupancy Prediction

Home heating uses more energy than any other residential energy expenditure, making increasing the efficiency of home heating an important goal for saving money and protecting the environment. We have built a home-heating system, PreHeat, that automatically programs your thermostat based on when you are home. PreHeat’s goal is to reduce the amount of time a household’s thermostat needs to be on without compromising the comfort of household members. PreHeat builds a predictive model of when the house is occupied and uses the model to optimize when the house is heated, to save energy without sacrificing comfort. Our system consists of Wi-Fi and passive, IR-based occupancy sensors; temperature sensors; heating-system controllers for U.S. forced-air systems and for U.K. water-filled radiators and under-floor heating; and PC-based control software using machine learning to predict schedules based on current and past occupancy. Learn more…

Face Recognition in Video

Face recognition in video is an emerging technology that will have great impact on user experience in fields such as television, gaming, and communication. In the near future, a television or an Xbox will be able to recognize people in the living room, home video will be annotated automatically and become searchable, and TV watchers will be able to get information about an unfamiliar actor, athlete, or singer just by pointing to the person on the screen. Our research showcases the face-recognition technology developed by iLabs. Our technology includes novel algorithms in face detection, recognition, and tracking. The research demonstrates semi-automatic labeling of videos, a novel TV-watching experience using faces in a video as hyperlinks to get more information, and automatic recognition of the person in front of the television, Xbox, or computer.

Fuzzy Contact Search for Windows Phone 7

Mobile-phone users typically search for contacts in their contact list by keying in names or email IDs. Users frequently make various types of mistakes, including phonetic, transposition, deletion, and substitution errors, and, in the specific case of mobile phones, the nature of the input mechanism makes mistakes more probable. We propose a fuzzy-contact-search feature to help users find the right contacts despite making mistakes while keying in a query. The feature is based on the novel, hashing-based spelling-correction technology developed by Microsoft Research India. We support many languages, including English, French, German, Italian, Spanish, Portuguese, Polish, Dutch, Japanese, Russian, Arabic, Hebrew, Chinese, Korean, and Hindi. We have built a Windows Phone 7 app to demonstrate our fuzzy contact search. The solution is lightweight and can be used in any client-side contact-search scenario.

High-Performance Cancer Screening

Our research demonstrates high-performance, GPU-based 3-D rendering for colon-cancer screening. The VCViewer provides a gesture-based user interface for the navigation and analysis of 3-D images generated by computed-tomography (CT) scans for colon-cancer screening. This viewer is supported by a server-side volume-rendering engine implemented by Microsoft Research. Our work shows a real-world, life-saving medical application for this engine. In addition, we show high-performance, CPU-based image processing needed to prepare CT colonoscopy images for diagnostic viewing. This processing was developed at the 3-D Imaging Lab at Massachusetts General Hospital and has been adapted for task and data parallelism in joint collaboration with Microsoft Developer and Platform Evangelism, Microsoft Research, and Intel.

InnerEye: Visual Recognition in the Hospital

Our research shows how a single, underlying image-recognition algorithm can enable a multitude of clinical applications, such as semantic image navigation, multimodal image registration, quality control, content-based image search, and natural user interfaces for surgery being enabled within the Microsoft Amalga unified intelligence system. Learn more…

Interactive Information Visualizations

Our research presents novel, interactive visualizations to help people understand large amounts of data:

  • iSketchVis applies the familiar, collaborative features of a whiteboard interface to the accurate data-exploration capabilities of computer-aided data visualization. It enables people to sketch charts and explore their data visually, on a pen-based tablet—or collaboratively, on whiteboards.
  • NetCharts enables people to analyze large data sets consisting of multiple entity types with multiple attributes. It uses simple charts to show aggregated data. People can explore these aggregates by dragging them out to create new charts.
  • Sets traditionally are represented by Euler diagrams with bubble-like shapes. This research presents two techniques to simplify Euler diagrams. In addition, we demonstrate LineSets, which uses a single, continuous curve to represent sets. It simplifies set intersections and offers multiple interactions.

MirageBlocks

Our research demonstrates the use of 3-D projection, combined with a Kinect depth camera to capture and display 3-D objects. Any physical object brought into the demo can be digitized instantaneously and viewed in 3-D. For example, we show a simple modeling application in which complex 3-D models can be constructed with just a few wooden blocks by digitizing and adding one block at a time. This setup also can be used in telepresence scenarios, in which what is real on your collaborator’s table is virtual—3-D projected—on yours, and vice versa. Our work shows how simulating real-world physics behaviors can be used to manipulate virtual 3-D objects. Our research uses a 3-D projector with active shutter glasses.

Mobile Photography: Capture, Process, and View

The mobile phone is becoming the most popular consumer camera. While the benefits are quite clear, the mobile scenario presents several challenges. It is not always easy to capture good photos. Image-processing tools can improve photos after capture, but there are few tools tailored to on-phone image manipulation. We present phone-based image-enhancement tools that are tightly integrated with cloud services. Heavy computation is off-loaded to the cloud, which enables faster results without impacting the phone’s performance.

Project Emporia: Personalized News

Project Emporia is a personalized news reader offering 250,000 articles daily as discovered through social news feeds. It combines state-of-the-art recommendation systems (Matchbox) with automatic content classification (ClickPredict) to enable users to fine-tune their news channels by category or a custom-keyword channel, combined with “more-like-this”/”less-like-this” votes. It is available as a mobile client as well as on the web.

Recognizing Pen Grips for Natural UI

By enabling multitouch sensing on a digital pen, we can recognize how the user is holding it. In the real world, people hold tools such as pens, paintbrushes, sketching pencils, knives, and compasses differently, and we enable a user to alter the grip on a digital pen to switch between functionalities. This enables a natural UI on the pen—mode switches are no longer necessary. Learn more…

Rich Interactive Narratives

Recent advances in visualization technologies have spawned a potent brew of visually rich applications that enable exploration over potentially large, complex data sets. Examples include GigaPan.org, Photosynth.net, PivotViewer, and WorldWide Telescope. At the same time, the narrative remains a dominant form for generating emotionally captivating content—movies or novels—or imparting complex knowledge, as in textbooks or journals. The Rich Interactive Narratives project aims to combine the compelling, time-tested narrative elements of multimedia storytelling with the information-rich, exploratory nature of the latest generation of information-visualization and -exploration technologies. We approach the problem not as a one-off application, Internet site, or proprietary framework, but rather as a data model that transcends a particular platform or technology. This has the potential of enabling entirely new ways for creating, transforming, augmenting, and presenting rich interactive content. Learn more…

ShadowDraw: Interactive Sketching Helper

Do you want to be able to sketch or draw better? ShadowDraw is an interactive assistant for freehand drawing. It automatically recognizes what you’re trying to draw and suggests new pen strokes for you to trace. As you draw new strokes, ShadowDraw refines its models in real time and provides new suggestions. ShadowDraw contains a large database of images with objects that a user might want to draw. The edges from any images that match the user’s current drawing are merged and shown as suggested “shadow strokes.” The user then can trace these strokes to improve the drawing. Learn more…

Social News Search for Companies

Social News Search for Companies uses social public data to build a great news portal for companies. The curation of this page can be crowdsourced to improve the quality of results. We tackle two questions: How can we use social media to provide a rich, topical, searchable, living news dashboard for any given company, and can we build an environment where the curation of the sources of content for a company page is done by the users of the page rather than by an editor? Learn more…

Videos

Videos

Watch the TechFest 2011 Videos

3-D Scanning with a regular camera or phone!

Watch video

3-D television is creating a huge buzz in the consumer space, but the generation of 3-D content remains a largely professional endeavor. Our research demonstrates an easy-to-use system for creating photorealistic, 3-D-image-based models simply by walking around an object of interest with your phone, still camera, or video camera. The objects might be your custom car or motorcycle, a wedding cake or dress, a rare musical instrument, or a hand-crafted artwork. Our system uses 3-D stereo matching techniques combined with image-based modeling and rendering to create a photorealistic model you can navigate simply by spinning it around on your screen, tablet, or mobile device.

3-D, Photo-Real Talking Head

Watch video

Dynamic texture mapping helps bypass the difficulties in rendering soft tissues like lips, tongue, eyes, and wrinkles, moving us one step closer to being able to create a more realistic personal avatar.

Applied Sciences Group: Smart Interactive Displays

Watch video

Steven Bathiche, Director, Microsoft Applied Sciences, shares his team’s latest work on the next generation of Smart Interactive Displays.

Facial Recognition in Videos

Watch video

Face recognition in video is an emerging technology that will have great impact on user experience in fields such as television, gaming, and communication. In the near future, a television or an Xbox will be able to recognize people in the living room, home video will be annotated automatically and become searchable, and TV watchers will be able to get information about an unfamiliar actor, athlete, or singer just by pointing to the person on the screen. Our research showcases the face-recognition technology developed by Innovation Labs. Our technology includes novel algorithms in face detection, recognition, and tracking. The research demonstrates semi-automatic labeling of videos, a novel TV-watching experience using faces in a video as hyperlinks to get more information, and automatic recognition of the person in front of the television, Xbox, or computer.

High-Performance Cancer Screening

Watch video

See how a high–performance, 3-D rendering engine can be transformed into a real-world, life-saving medical application.

InnerEye: Visual Recognition in the Hospital

Watch video

InnerEye focuses on the analysis of patient scans using machine learning techniques for automatic detection and segmentation of healthy anatomy as well as anomalies.

MirageBlocks

Watch video

See how simulating real-world physics behaviors can be used to manipulate virtual 3-D objects using 3-D projection and a Kinect depth camera.

Mobile Photography-Capture, process and View

Watch video

The mobile phone is becoming the most popular consumer camera. While the benefits are quite clear, the mobile scenario presents several challenges. It is not always easy to capture good photos. Image-processing tools can improve photos after capture, but there are few tools tailored to on-phone image manipulation. We present phone-based image enhancement tools that are tightly integrated with cloud services. Heavy computation is off-loaded to the cloud, which enables faster results without impacting the phone’s performance.

ShadowDraw

Watch video

An object-oriented research project delivers an interactive assistant for freehand drawing by recognizing what you’re trying to draw and suggesting traceable pen strokes to improve your drawing.

A montage of impact made by Microsoft Research

Watch video

Nearly every product that Microsoft ships includes technology from Microsoft Research. Through exploration and collaboration with product groups and academic institutions, Microsoft Research advances the state of the art of computing.

Videos