Publication

Homophobia and transphobia span identification in low-resource languages

Kumaresan, Prasanna Kumar
Kayande, Devendra Deepak
Priyadharshini, Ruba
Buitelaar, Paul
Chakravarthi, Bharathi Raja
Citation
Kumaresan, Prasanna Kumar, Kayande, Devendra Deepak, Priyadharshini, Ruba, Buitelaar, Paul, & Chakravarthi, Bharathi Raja. (2025). Homophobia and transphobia span identification in low-resource languages. Natural Language Processing Journal, 12, 100169. https://doi.org/10.1016/j.nlp.2025.100169
Abstract
Online platforms have become prevalent because they promote free speech and group discussions. However, they also serve as platforms for hate speech, which can negatively impact the psychological well-being of vulnerable people. This is especially true for members of the LGBTQ+ community, who are often the targets of homophobia and transphobia in online environments. Our study makes three main contributions: (1) we developed a new dataset with span-level annotations for homophobia and transphobia in Tamil, English, and Marathi; (2) we employed advanced language models using BERT-based architectures, Conditional Random Field (CRF), and Bidirectional Long Short-Term Memory (BiLSTM) layers to enhance span-level detection of harmful content; and (3) we conducted benchmarking to evaluate the effectiveness of monolingual and multilingual models in detecting subtle forms of hate speech. The annotated dataset, which is collected from real-world social media (YouTube) content, provides diverse language contexts and enhances the representation of low-resource languages. The span-based detection approach enables models to detect subtle linguistic nuances, leading to more precise content moderation that accounts for cultural differences. The experimental results show that our models achieve effective span detection, which provides valuable information for creating inclusive moderation tools. Our research leads to the development of AI systems, and we aim to reduce the burden on moderators and improve the quality of online experiences for LGBTQ+ vulnerable.
Publisher
Elsevier
Publisher DOI
Rights
CC BY