Publication

Contributions to neural network models and training datasets for facial depth

Khan, Faisal
Citation
Abstract
The depth estimation problem has made significant progress due to recent improvements in Convolutional Neural Networks (CNN) and the incorporation of traditional methodologies in these deep learning systems. Depth estimation is one of the fundamental computer vision tasks, as it involves the inverse problem of reconstructing the three-dimensional scene structure from two-dimensional projections. Due to the compactness and low cost of monocular cameras, there has been a significant and increasing interest in depth estimation from a single RGB image. Current single-view depth estimation techniques, however, are extremely slow for real-time inference on an embedded platform and are based on fairly large deep neural networks that require a large range of training sets. Due to the difficulties in obtaining dense ground-truth depth at scale across various environments, a range of datasets with distinctive features and biases have developed. This thesis firstly provides a summary of the depth estimation datasets, depth estimation techniques, studies, patterns, difficulties, loss function and opportunities that are present for open research. For effective depth estimation from a single image frame, a method is proposed to generate synthetic high accuracy human facial depth from synthetic 3D face models that enables us to train the CNN models to resolve facial depth estimation challenges. To validate the synthetic facial depth data, a brief comparison analysis of cutting-edge depth estimation algorithms on individual image frames from the generated synthetic dataset is proposed. Following that, two different lightweight encoder-decoder-based neural networks for training on the generated dataset are proposed, and when tested and evaluated across four public datasets, the proposed networks are shown to be computationally efficient and outperform the current state-of-the-art. The proposed lightweight models will allow us to use the low-complexity models, making them suitable for implementation on edge devices. Synthetic human facial depth data can help overcome the lack of real data and can increase the performance of the deep learning methods for depth maps.
Funder
Publisher
NUI Galway
Publisher DOI
Rights
Attribution-NonCommercial-NoDerivs 3.0 Ireland
CC BY-NC-ND 3.0 IE