Automatic Acquisition of High-fidelity Facial Performances Using Monocular Videos
- Xin Tong, Microsoft
This paper presents a facial performance capture system that automatically captures high-fidelity facial performances using uncontrolled monocular videos (e.g., Internet videos). We start the process by detecting and tracking important facial features such as the nose tip and mouth corners across the entire sequence and then use the detected facial features along with multilinear facial models to reconstruct 3D head poses and large-scale facial deformation of the subject at each frame. We utilize per-pixel shading cues to add finescale surface details such as emerging or disappearing wrinkles and folds into large-scale facial deformation. At a final step, we iterate our reconstruction procedure on large-scale facial geometry and fine-scale facial details to further improve the accuracy of facial reconstruction. We have tested our system on monocular videos downloaded from the Internet, demonstrating its accuracy and robustness under a variety of uncontrolled lighting conditions and overcoming significant shape differences across individuals. We show our system advances the state of the art in facial performance capture by comparing against alternative methods.
-
-
Xin Tong
Partner Research Manager
-
-
Watch Next
-
-
GeoMind: A Multi-Agent Framework for Geospatial Decision Support
- Muhammad Sohail Danish
-
-
From Microfarms to the Moon: A Teen Innovator’s Journey in Robotics
- Pranav Kumar Redlapalli
-
-
DAViD: Data-efficient and Accurate Vision Models from Synthetic Data
- Sadegh Aliakbarian,
- Tadas Baltrusaitis,
- Antonio Criminisi
-
VoluMe: Authentic 3D Video Calls from Live Gaussian Splat Prediction
- Antonio Criminisi,
- Charlie Hewitt,
- Marek Kowalski (HE/HIM)
-
Episode 7: The road ahead
- Jonathan M. Carlson,
- Will Guyman,
- Matthew Lungren
-
-
Microsoft Research India - The lab culture
- P. Anandan,
- Indrani Medhi Thies,
- B. Ashok