Publication

Game engine based synthetic data generation schemes and convolutional neural networks

Citation
Abstract
This thesis presents designs and implementations to generate synthetic images using game engines to train CNNs (Convolutional Neural Networks). It also investigates some fundamental properties of CNNs, namely the performance characteristics with different number of target classes; and the training characteristics using our novel learning rate tuning method. To train CNNs for a computer vision problem requires a huge number of annotated images as training data, which is labour intensive and expensive. A part or whole of the training data can be synthesized with a wide variety of methods. Our first contribution is to synthesize aerial top-down images and thereby, attempt and demonstrate the feasibility of two domain transfers at once, one being synth-to-real (training on synthetic data and predicting on real data), two being front-facing to aerial domain (taking a CNN pretrained on consumer camera images which are primarily front-facing and finetuning/testing that CNN on aerial top-down images). We generated synthetic data for that from a realistic virtual 3D game environment by programmatically flying a (quadrocopter) Robotic Aerial Vehicle (RAV) inside the game and annotating the synthetic images so taken from its camera. We then demonstrated dual domain transfer by detecting aerial-view real-world objects using a CNN trained on our synthetic data. Our second contribution is the design, development and evaluation of a hybrid synthetic data generation approach that combines the realistic lighting, object placements etc. from the 3D game engine with complex textures and backgrounds sourced from the internet. The network finetuned with synthetic data so collected outperforms the same network finetuned with real data when tested on a challenging dataset called ObjectNet and also sets a state-of-the-art result for any convolutional neural network on ObjectNet. Our third contribution is an investigative work that delves into performance characteristics of CNNs with increasing number of classes to predict. To that end, we conduct a systematic investigation on three ubiquitous computer vision tasks – image classification, object detection, and semantic segmentation, examining how performance changes with increasing number of class labels, while controlling for variables like CNN architecture and training methodology. We use multiple datasets for each task. We find that in image classification and semantic segmentation, performance decreases with increasing number of classes. Conversely, we discover that performance improves with more classes in object detection. We further explore this observed difference by visualizing and analyzing feature maps in terms of their clustering performance. We conclude that in object detection, the feature map clusters become tighter and better separated as the number of classes increases, leading to an increase in performance.
Publisher
University of Galway
Publisher DOI
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International