Microsoft Research at SIGGRAPH 2014

Published August 11, 2014

Share this page

Posted by Rob Knies

(opens in new tab)Microsoft researchers will present a broad spectrum of new research at SIGGRAPH 2014 (opens in new tab), the 41st International Conference and Exhibition on Computer Graphics and Interactive Techniques, which starts today in Vancouver, British Columbia. Sponsored by the Association for Computing Machinery, SIGGRAPH is at the cutting edge of research in computer graphics and related areas, such as computer vision and interactive systems. SIGGRAPH has evolved to become an international community of respected technical and creative individuals, attracting researchers, artists, developers, filmmakers, scientists, and business professionals from all over the world. The research presented from Microsoft (opens in new tab) was developed across our global labs—from converting any camera into a depth-camera, to optimizing a scheme for clothing animation, and pushing the boundaries on new animated high-fidelity facial expression and performance capture techniques.

Depth camera and performance capture

Shahram Izadi (opens in new tab), principal researcher at Microsoft Research, and his collaborators will present two papers this year. The first, Real-Time Non-Rigid Reconstruction Using an RGB-D Camera (opens in new tab), alongside academic partners at Stanford, MPI, and Erlangen, demonstrates interactive performance capture using a novel GPU-based algorithm, and a high-resolution Kinect-like depth camera they have developed. The idea is to bring the level of performance capture that we see in Hollywood movies into our living rooms and everyday lives.

The second, Learning to Be a Depth Camera for Close-Range Human Capture and Interaction (opens in new tab), has already garnered broad attention. The paper demonstrates how to turn any cheap, visible light camera—be it a web camera or even the camera on your mobile phone—into a depth sensor to create rich interactive scenarios. In describing the work, Izadi says, "In recent years, we've seen a great deal of excitement regarding depth cameras such as the Kinect. These essentially enrich the ways that computers can see the world beyond a regular 2-D camera, and they aid in many tasks in computer vision, such as background segmentation, resolving scale and so forth. However, there are many scenarios that are currently prohibitive for depth cameras because of power consumption, size and cost."

(opens in new tab)"Our goal was to build a very cheap depth camera, around $1 in cost," says Izadi. The technique applies simple modifications to a regular RGB camera, and uses a new machine-learning algorithm based on decision trees, that can take the modified RGB images and automatically and accurately map them to depth. This allowed the team, with lead researchers Sean Fanello (opens in new tab) and Cem Keskin, to turn any camera into a depth camera for scenarios where you want to specifically sense hands and faces of users. "So it is not a general depth camera that can sense any object, but it works extremely well for hands and faces, which are important for creating interactive scenarios," says Izadi.

So what challenges did the team encounter? "The problem of inferring depth from intensity images is a challenging or even ill-posed problem within computer vision known as shape from shading," says Izadi. "What we highlight is that by constraining the problem to interactive scenarios, and using active illumination, and state-of-the-art machine-learning techniques we can actual solve this problem for specific scenarios of use. It opens up many new areas of applications and research, because now depth cameras can be as cheap as any off-the-shelf web camera, and now anywhere a camera exists—such as in your mobile phone—a depth camera can also exist."

High-fidelity facial animation data

Another paper being presented at SIGGRAPH, Controllable High-Fidelity Facial Performance Transfer (opens in new tab), is the result of a collaboration between Feng Xu, associate researcher from Microsoft Research Asia, and researchers at Texas A&M and Tsinghua University. The paper introduces a novel facial expression transfer and editing technique for high-fidelity facial animation data.

(opens in new tab)

The key idea is to decompose high-fidelity facial performances into large-scale facial deformation, while delivering fine-scale facial details, and reconstructing them to the desired retargeted animation. This approach provides a new technique to animate a digital character that is not necessarily a digital replica of the performer. The approach takes a source reference expression, a source facial animation sequence, and a target expression as input, and outputs a retargeted animation sequence that is "analogous" to the source animation. Importantly, it allows the user to control and adjust both the large-scale deformation and fine-scale facial details of the retargeted animation, reducing the manual work an animator is typically required to correct or adjust.

"More and more high-fidelity facial data is captured by some recent techniques," says Xu. "Our technique aims to reuse existing high-fidelity facial data to generate animations on new characters or avatars. Besides faithfully transfer the input facial performance by our decomposition scheme, we give easy and flexible control to users to further edit both the large-scale motion and facial details. This user control makes it possible to get good results on a target with large shape difference to the source, like a dog or a monster, also, it is possible to change the style of the capture motion to satisfy user's requirement, which are useful for animators to generate animations.

Hyper-lapse video conversion

(opens in new tab)No strangers to SIGGRAPH are the authors of the First-Person Hyper-Lapse Video (opens in new tab) paper, Microsoft researcher Johannes Kopf (opens in new tab) (@JPKopf (opens in new tab)), principal researcher Michael Cohen (opens in new tab), and Richard Szeliski (opens in new tab) (@szeliski (opens in new tab)), distinguished scientist and previous SIGGRAPH Computer Graphics Achievement award winner. The paper provides a method for converting first-person videos into hyper-lapse videos. Seeing is believing and the results are astounding, which you can view on this video (opens in new tab) showcasing their results. More information on this work can be found in a recent Next at Microsoft blog post (opens in new tab).

Izadi sums up: "SIGGRAPH is one of the premier conferences in computer science, with one of the highest impact factors. It is also the intersection of many different research fields, not just computer graphics, but human-computer interaction (opens in new tab), computer vision (opens in new tab), machine learning (opens in new tab), and even new sensors, displays and hardware. So there's inspiration to draw from many research areas, and it nicely complements the multi-discipline nature of Microsoft Research. There are not only great technical talks at the conference, including 10 from Microsoft Research, but also E-tech, which showcases lots of demos, and highlights the interactive element of the conference, which is something that really resonates with us."

Microsoft Research Blog

Depth camera and performance capture

The AI Revolution in Medicine, Revisited

High-fidelity facial animation data

Hyper-lapse video conversion