Publication

ChildDiffusion: Unlocking the potential of Generative AI and controllable augmentations for child facial data using stable diffusion and large language models

Farooq, Muhammad Ali
Yao, Wang
Corcoran, Peter
Citation
Farooq, M. A., Yao, W., & Corcoran, P. (2025). ChildDiffusion: Unlocking the potential of Generative AI and controllable augmentations for child facial data using stable diffusion and large language models. IEEE Access, 13, 96616-96634. https://doi.org/10.1109/ACCESS.2025.3575964
Abstract
Ensuring the availability of child facial datasets is essential for advancing AI applications, yet legal, ethical, and data scarcity concerns pose significant challenges. Current generative models such as StyleGAN excel at producing synthetic facial data but struggle with temporal consistency, control over output attributes, and diversity in rendered features. These limitations underscore the need for a more robust and adaptable framework. In this research, we propose the ChildDiffusion framework, designed to generate photorealistic child facial data using diffusion models. The framework integrates intelligent augmentations via short text prompts, employs various image samplers, and leverages ControlNet for enhanced model conditioning. Additionally, we have used large language models (LLMs) to provide complex textual guidance to enable precise image-to-image transformations, facilitating the curation of diverse, high-quality datasets. The model was validated by generating child faces with varied ethnicities, facial expressions, poses, lighting conditions, eye-blinking effects, accessories, hair colors, and multi-subject compositions. To exemplify its potential, we open-sourced a dataset of 2.5k child facial samples across five ethnic classes, which underwent rigorous qualitative and quantitative evaluations. Further, we fine-tuned a Vision Transformer model to classify child ethnicity as a downstream task, demonstrating the framework’s utility. This research advances generative AI by addressing data scarcity and ethical challenges, showcasing how diffusion models can produce realistic child facial data while ensuring compliance with privacy standards. The versatile ChildDiffusion framework offers broad potential for machine learning applications, serving as a valuable tool for AI innovation. The project website, along with the complete ChildRace dataset and the fine-tuned model, is available at (https://mali-farooq.github.io/childdiffusion/).
Publisher
Institute of Electrical and Electronics Engineers
Publisher DOI
Rights
Attribution 4.0 International