Loading...
Deepfake audio detection in low-resource languages: A case study of Urdu
Owais, Muhammad ; Jadoon, Khurram Khan ; Sandhu, Ali Imran ; Ali, Zaiwar ; Mahmood, Zahid ; Yahya, Muhammad ; Wahid, Abdul
Owais, Muhammad
Jadoon, Khurram Khan
Sandhu, Ali Imran
Ali, Zaiwar
Mahmood, Zahid
Yahya, Muhammad
Wahid, Abdul
Citations
Altmetric:
Publication Date
2026-01-16
Type
journal article
Downloads
Citation
Owais, M., Jadoon, K. Khan, Sandhu, A. I., Ali, Z., Mahmood, Z., Yahya, M., & Wahid, A. (2026). Deepfake Audio Detection in Low-Resource Languages: A Case Study of Urdu. IEEE Access, 14, 12407-12421. https://doi.org/10.1109/ACCESS.2026.3654621
Abstract
The rapid advancement of Generative Artificial Intelligence (AI) has enabled the creation of highly realistic synthetic audio, presenting significant challenges to digital security, media forensics, and public confidence. While deepfake detection has been extensively explored for high-resource languages like English, low-resource languages remain critically underexamined. This paper introduces a systematic benchmark of deepfake audio detection methods for Urdu, a language characterized by its rich morphology and phonetic complexity. We evaluate convolutional and transformer-based architectures, including LCNN, CNN–LSTM with Attention, and Whisper variants, employing Mel-Frequency Cepstral Coefficients and Linear Frequency Cepstral Coefficients as front-end features. To address data scarcity, we construct three dataset configurations, baseline, augmented, and extended derived from an existing Urdu deepfake corpus and enhanced through controlled augmentation and additional recordings. Our experiments, conducted with multiple random seeds and statistical validation, demonstrate that MFCC-based models, particularly Whisper-small, achieve strong performance with an Equal Error Rate as low as 0.50%. Robustness tests under noise, pitch, and tempo perturbations highlight the limitations of lightweight CNNs and underscore the advantages of transformer embeddings for handling Urdu’s linguistic variability. This study represents the first structured benchmark of deepfake audio detection techniques for Urdu, offering empirical insights into how language characteristics influence model performance and generalization. The findings emphasize the importance of multilingual evaluation in the development of trustworthy speech forensics systems.
Funder
Publisher
Institute of Electrical and Electronics Engineers
Publisher DOI
Rights
CC BY