Publication

Contributions towards 3D synthetic facial data generation and 3D face analysis with weak supervision

Basak, Shubhajit
Loading...
Thumbnail Image
Identifiers
http://hdl.handle.net/10379/18085
https://doi.org/10.13025/17017
Repository DOI
Publication Date
2024-03-11
Type
Thesis
Downloads
Citation
Abstract
Facial analysis tasks are of pivotal importance in social interaction, thus gaining extensive attention among the scientific community. With the increased popularity of deep learning models and the availability of high-performance infrastructure, it has become the de-facto tool for different facial analysis applications. However, when it comes to 3D facial analysis tasks like 3D face alignment, face reconstruction, facial expression analysis, etc., the availability of high-quality 3D face data is the biggest bottleneck. Particularly collecting accurate real ground truth pose and depth information is very challenging because of the limitations of real-world sensors. Furthermore, with the recent introduction of data privacy laws like GDPR and their associated restrictions, collecting face datasets has become more challenging, as it involves human subjects. With the advancement of computer graphics tools, domain-specific data generation with accurate annotations has provided a feasible alternative to real data. Though synthetic data can be a choice for deep learning training, the resulting domain gap between synthetic and real environments is still a challenge for the trained model to perform well in a real-world scenario. As a result, another type of approach has gained popularity: unsupervised learning, where the model tries to learn the objective without any annotated data. In this dissertation, we address the issue of the unavailability of high-quality, accurate real face data by applying these two approaches. With the help of low-cost digital asset creation software and an open-source computer graphics tool, we first build a pipeline to create a large synthetic face dataset. We rendered around 300k synthetic face images with extensive data diversity, such as different scene illuminations, backgrounds, facial expressions, etc., with their ground truth annotations like the 3D head pose and facial raw depth. We validate the synthetic data with two different facial analysis tasks - head pose estimation and face depth estimation. While learning the head pose from the synthetic images, we propose an unsupervised domain adversarial learning methodology to reduce the domain gap between the real and synthetic face images. We show that using our method, we can achieve near-state of-the-art (SOTA) results with unsupervised training compared to the supervised methods that solely use real data to train their model.
Publisher
NUI Galway
Publisher DOI
Rights
Attribution-NonCommercial-NoDerivs 3.0 Ireland
CC BY-NC-ND 3.0 IE