Real-time Facial Performance Capture and Manipulation

Ma, Luming

Real-time Facial Performance Capture and Manipulation

dc.contributor.advisor	Deng, Zhigang
dc.contributor.committeeMember	Mayerich, David
dc.contributor.committeeMember	Chen, Guoning
dc.contributor.committeeMember	Shah, Shishir Kirit
dc.creator	Ma, Luming
dc.date.accessioned	2020-06-04T02:12:50Z
dc.date.created	May 2020
dc.date.issued	2020-05
dc.date.submitted	May 2020
dc.date.updated	2020-06-04T02:12:50Z
dc.description.abstract	Acquisition and editing of facial performance is an essential and challenging task in computer graphics, with broad applications in films, cartoons, VR systems, and electronic games. The creation of high-resolution, realistic facial animations often involves controlled lighting setups, multiple cameras, active markers, depth sensors, and substantial post-editing from experienced artists. This dissertation focuses on the capture and manipulation of facial performance from regular RGB video. First, a novel method is proposed to reconstruct high-resolution facial geometry and appearance in real-time by capturing an individual-specific face model with fine-scale details, based on monocular RGB video input. Specifically, after the coarse facial model is reconstructed from the input video, it is subsequently refined using shape-from-shading techniques, where illumination, albedo texture, and displacements are recovered by minimizing the difference between the synthesized face and the input RGB video. To recover wrinkle level details, a hierarchical face pyramid is built through adaptive subdivisions and progressive refinements of the mesh from a coarse level to a fine level. The proposed approach can produce results close to off-line methods and better than previous real-time methods. On top of the reconstruction method, two manipulation approaches upon facial expressions and facial appearance are proposed, namely facial expression transformation and face swapping. In facial expression transformation, desired and photo-realistic facial expressions are directly generated on top of input monocular RGB video without the need of any driving source actor. An unpaired learning framework is developed to learn the mapping between any two facial expressions in the facial blendshape space. The proposed method automatically transforms the source expression in an input video clip to a specified target expression through the combination of the 3D face reconstruction, the learned bi-directional expression mapping, and automatic lip correction. It can be applied to new users with different identities, ages, speeches, and expressions, and without additional training. In face swapping, a high-fidelity method is presented to replace the face in a target video clip by the face from a single source portrait image. First, the face reconstruction method is run on both the source image and the target video. Then, the albedo of the source face is modified by a novel harmonization method to match the target face. The face geometry is predicted as the source identity performing the target expression with person-specific wrinkle style. Finally, the source face is re-rendered and blended into the target video using the lighting and camera parameters from the target video. The proposed method runs fully automatically and at a real-time rate on any target face captured by cameras or from legacy videos. More importantly, unlike existing deep-learning- based methods, it does not need to pre-train any models, i.e., pre-collecting a large image/video dataset of the source or target face for model training is not required.
dc.description.department	Computer Science, Department of
dc.format.digitalOrigin	born digital
dc.format.mimetype	application/pdf
dc.identifier.citation	Portions of this document appear in: Ma, Luming, and Zhigang Deng. "Real-time hierarchical facial performance capture." In Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, pp. 1-10. 2019. And in: Ma, Luming, and Zhigang Deng. "Real‐Time Facial Expression Transformation for Monocular RGB Video." In Computer Graphics Forum, vol. 38, no. 1, pp. 470-481. 2019. And in: Ma, Luming, and Zhigang Deng. "Real-time Face Video Swapping From A Single Portrait." In Symposium on Interactive 3D Graphics and Games, pp. 1-10. 2020.
dc.identifier.uri	https://hdl.handle.net/10657/6693
dc.language.iso	eng
dc.rights	The author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. UH Libraries has secured permission to reproduce any and all previously published materials contained in the work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).
dc.subject	face reconstruction
dc.subject	expression transformation
dc.subject	face swapping
dc.subject	RGB video
dc.subject	gpu optimization
dc.subject	shape-from-shading
dc.title	Real-time Facial Performance Capture and Manipulation
dc.type.dcmi	Text
dc.type.genre	Thesis
local.embargo.lift	2022-05-01
local.embargo.terms	2022-05-01
thesis.degree.college	College of Natural Sciences and Mathematics
thesis.degree.department	Computer Science, Department of
thesis.degree.discipline	Computer Science
thesis.degree.grantor	University of Houston
thesis.degree.level	Doctoral
thesis.degree.name	Doctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1

Name:: MA-DISSERTATION-2020.pdf
Size:: 8.5 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 2 of 2

Name:: PROQUEST_LICENSE.txt
Size:: 4.42 KB
Format:: Plain Text
Description:

Download

Name:: LICENSE.txt
Size:: 1.81 KB
Format:: Plain Text
Description:

Download

Collections

Published ETD Collection