Automatic Acquisition of High-fidelity Facial Performances Using Monocular Videos

November 19, 2014
Xin Tong, Microsoft

This paper presents a facial performance capture system that automatically captures high-fidelity facial performances using uncontrolled monocular videos (e.g., Internet videos). We start the process by detecting and tracking important facial features such as the nose tip and mouth corners across the entire sequence and then use the detected facial features along with multilinear facial models to reconstruct 3D head poses and large-scale facial deformation of the subject at each frame. We utilize per-pixel shading cues to add finescale surface details such as emerging or disappearing wrinkles and folds into large-scale facial deformation. At a final step, we iterate our reconstruction procedure on large-scale facial geometry and fine-scale facial details to further improve the accuracy of facial reconstruction. We have tested our system on monocular videos downloaded from the Internet, demonstrating its accuracy and robustness under a variety of uncontrolled lighting conditions and overcoming significant shape differences across individuals. We show our system advances the state of the art in facial performance capture by comparing against alternative methods.