Automatic 3D Facial Performance Acquisition and Animation using Monocular Videos
MetadataShow full item record
Facial performance capture and animation is an essential component of many applications such as movies, video games, and virtual environments. Video-based facial performance capture is particularly appealing as it offers the lowest cost and the potential use of legacy sources and uncontrolled videos. However, it is also challenging because of complex facial movements at different scales, ambiguity caused by the loss of depth information, and a lack of discernible features on most facial regions. Unknown lighting conditions and camera parameters further complicate the problem. This dissertation explores the video-based 3D facial performance capture systems that use a single video camera, overcome the challenges aforementioned, and produce accurate and robust reconstruction results. We first develop a novel automatic facial feature detection/tracking algorithm that accurately locates important facial features across the entire video sequence, which are then used for 3D pose and facial shape reconstruction. The key idea is to combine the respective powers of local detection, spatial priors for facial feature locations, Active Appearance Models (AAMs), and temporal coherence for facial feature detection. The algorithm runs in realtime and is robust to large pose and expression variations and occlusions. We then present an automatic high-fidelity facial performance capture system that works on monocular videos. It uses the detected facial features along with multilinear facial models to reconstruct 3D head poses and large-scale facial deformation, and uses per-pixel shading cues to add fine-scale surface details such as emerging or disappearing wrinkles and folds. We iterate the reconstruction procedure on large-scale facial geometry and fine-scale facial details to improve the accuracy of facial reconstruction. We further improve the accuracy and efficiency of the large-scale facial performance capture by introducing a local binary feature based 2D feature regression and a convolutional neural network based pose and expression regression, and complement it with an efficient 3D eye gaze tracker to achieve realtime 3D eye gaze animation. We have tested our systems on various monocular videos, demonstrating the accuracy and robustness under a variety of uncontrolled lighting conditions and overcoming significant shape differences across individuals.
Shi, Fuhao (2017). Automatic 3D Facial Performance Acquisition and Animation using Monocular Videos. Doctoral dissertation, Texas A & M University. Available electronically from