Counter-speech generation for homophobic and transphobic social media content in Malayalam
Prasannan, Praveen ; Kumaresan, Prasanna Kumar ; Rajiakodi, Saranya ; Subalalitha, C. N. ; Chakravarthi, Bharathi Raja
Prasannan, Praveen
Kumaresan, Prasanna Kumar
Rajiakodi, Saranya
Subalalitha, C. N.
Chakravarthi, Bharathi Raja
Loading...
Files
Loading...
s13278-025-01507-x.pdf
Adobe PDF, 3.57 MB
Publication Date
2025-08-12
Type
journal article
Downloads
Citation
Prasannan, Praveen, Kumaresan, Prasanna Kumar, Rajiakodi, Saranya, Subalalitha, C. N., & Chakravarthi, Bharathi Raja. (2025). Counter-speech generation for homophobic and transphobic social media content in Malayalam. Social Network Analysis and Mining, 15(1), 87. https://doi.org/10.1007/s13278-025-01507-x
Abstract
The growing prevalence of hate speech online has amplified acts of discrimination against marginalized populations, with the LGBTQIA+ community being particularly affected. In areas where under-resourced languages such as Malayalam are used, the issue grows more complex because of the absence of localized resources. This research offers an in-depth analysis of the production of counter-speech to address transphobia and homophobia in Malayalam. Our work incorporates both native Malayalam script and Malayalam written in Latin script, addressing the diverse linguistic practices of online users in Kerala. This paper introduces a two-stage pipeline to counter such online abuse. The first stage focuses on dataset creation through a human-in-the-loop process, beginning with 100 seed pairs of hate speech and their corresponding counter-speech manually curated. This set is expanded iteratively using language models culminating in 5,000 validated pairs. In the second stage, we propose a method to generate counter speech in Malayalam that leverages the Retrieval-Augmented Generation framework enhanced by REFINE (Retrieval Evaluation via Fluency, Inversion, and NEarness) for knowledge retrieval and constrained decoding. Evaluation metrics for both dataset quality and model performance demonstrate the effectiveness of this approach in producing diverse, fluent, and target-specific counter-speech. This research provides a foundational resource and scalable strategy for countering hate in low-resource regional languages. GitHub Link: https://github.com/Bharathi-AI-for-Social-Good/CN-Malayalam.
Funder
Publisher
Springer
Publisher DOI
Rights
CC BY