Contributions to data augmentation techniques and synthetic data for training deep neural networks
Varkarakis, Viktor
Varkarakis, Viktor
Loading...
Publication Date
2022-06-10
Type
Thesis
Downloads
Citation
Abstract
In the recent years deep learning has become more and more popular and it is applied in a variety of fields, yielding outstanding results in different machine learning applications. Deep learning based solutions thrive when a large amount of data is available for a specific problem but data availability and preparation are the biggest bottlenecks in the deep learning pipelines. With the fast-changing technology environment, new unique problems arise daily. In order to realise solutions in many of these specific problem domains there is a growing need to build custom datasets that are tailored for a particular use case with matching ground truth data. Acquiring such datasets at the scale required for training with today’s AI systems and subsequently annotating them with an accurate ground truth is challenging. Furthermore, with the recent introduction of GDPR and associated complications introduced, industry now faces additional challenges in the collection of training data that is linked to individual persons. This dissertation focuses on ways to overcome the unavailability of real data and avoid the challenges that come with a data acquisition process. More specifically data augmentation techniques are proposed to overcome the unavailability of real data, improve performance and allow the use of low-complexity models, suitable for implementation in edge devices. Furthermore, the idea of using AI tools to build large synthetic datasets is considered as an alternative to real data samples. The first steps in order to build and incorporate synthetic datasets effectively into the deep learning training pipelines include: building AI tools, that will generate a large amount of new data and/or augment these data samples and also create methodologies and techniques to validate that the generate data behave like real ones and also measure whether their use is effective when incorporated in the training pipelines, with this dissertation contributing to both of these steps.
Funder
Publisher
NUI Galway