Loading...
Thumbnail Image
Publication

Investigating transformer models for textual bias detection in model, data, and dataspace cards

Donald, Andy
Galanopoulos, Apostolos
Kumar Ojha, Atul
Curry, Edward
Muñoz, Emir
Ullah, Ihsan
McCrae, John P.
Kalra, Manan
Saxena, Sagar
Iqbal, Talha
Citation
Donald, Andy, Galanopoulos, Apostolos, Ojha, Atul Kumar, Curry, Edward, Muñoz, Emir, Ullah, Ihsan, McCrae, John P., Kalra, Manan, Saxena, Sagar, Iqbal, Talha. (2026). Investigating transformer models for textual bias detection in model, data, and dataspace cards. AI and Ethics, 6(1), 118. https://doi.org/10.1007/s43681-025-00975-3
Abstract
Identifying hidden biases in AI documentation metadata (model, data, and dataspace cards) is essential for responsible AI; yet this domain remains largely unexplored. The proposed work evaluates four Transformer models (XLNet, DistilBERT, RoBERTa, and ELECTRA) for bias detection across publicly available, synthetic, and custom datasets. On the BABE news corpus, all models achieved 77–80% accuracy, with only ELECTRA exceeding 80% on every metric. To address the absence of publicly available AI-card datasets, we generated synthetic metadata for two use cases (Customer Interaction and Customer Data Uploaded by Organisations) using ChatGPT. Models trained on this synthetic corpus displayed near-perfect scores, reflecting shared stylistic cues embedded in the generated text. To test real-world robustness, we curated a Hugging Face dataset by scraping documentation comments, filtering for bias-related keywords, and obtaining annotations from four independent labellers in a single-blind setting. Partial fine-tuning (zero-shot) evaluations of models trained only on BABE or synthetic data revealed substantial performance drops on this real-world set. To mitigate this cross-domain loss, we introduce a cascaded, full fine-tuning (few-shot) pipeline in which Transformer models are sequentially fine-tuned on BABE, synthetic text, and a subset of the Hugging Face corpus. Evaluation on the remaining portion achieved over 85% across all performance metrics, enhancing precision and generalisation. This study demonstrates the challenges of bias detection beyond controlled or synthetic data and highlights cascaded fine-tuning as a practical, low-resource strategy. Future directions include leveraging evidence fusion methods, integrating cross-attention with bias taxonomies, and adopting dual-encoder architectures to advance bias detection toward more in-depth, knowledge-guided reasoning.
Publisher
Springer
Publisher DOI
Rights
CC BY
Collections