Getting more from your datasets: Data augmentation, annotation and generative techniques

Corcoran, Peter
Bazrafkkan, Shabab
Lemley, Joe
Corcoran, Peter , Bazrafkan, Shabab , & Lemley, Joe. (2018). Getting more from your datasets: Data augmentation, annotation and generative techniques. Paper presented at the Embedded Vision Summit Santa Clara Convention Center, Silicon Valley, Santa Clara, California, 22-23 May.
Deep Learning for embedded vision requires large datasets. Indeed the more varied training data is, the more accurate the trained network. But, acquiring and accurately annotating datasets costs time and money. This talk will show how to get more from existing datasets. Firstly, state-of-art data augmentation techniques are reviewed, and a new approach, smart augmentation, is explained. CNN network-A vs. trained, learning optimal augmentation strategies for CNN network-B. Secondly, Generative Adversarial Networks (GAN) learn the structure of an existing dataset and several example use cases show how GANs can generate new data corresponding to the original dataset. The example of creating a very large dataset of facial training data is presented. But, building a dataset is not the whole problem data must be annotated in a way that is meaningful for the training process. An example of training a GAN from a dataset that incorporates annotations is given. This enables pre-annotated data to be generated, providing an exciting way to create large datasets at significantly reduced costs.
Publisher DOI
Attribution-NonCommercial-NoDerivs 3.0 Ireland