Publication

Detecting caste and migration hate speech in low-resource Tamil language

Chakravarthi, Bharathi Raja
Rajiakodi, Saranya
Ponnusamy, Rahul
Sivagnanam, Bhuvaneswari
Thakare, Sara Yogesh
Thangasamy, Sathiyaraj
Citation
Chakravarthi, Bharathi Raja, Rajiakodi, Saranya, Ponnusamy, Rahul, Sivagnanam, Bhuvaneswari, Thakare, Sara Yogesh, & Thangasamy, Sathiyaraj. (2025). Detecting caste and migration hate speech in low-resource Tamil language. Language Resources and Evaluation. https://doi.org/10.1007/s10579-025-09848-x
Abstract
The Indian constitution categorizes its population into groups such as Scheduled Castes, Scheduled Tribes, Other Backward Classes, and Forward Castes, reflecting historical inequalities that influence social dynamics and discrimination. Migrants who are relocating within the country for better opportunities are often viewed as outsiders, leading to concerns about job security and crime, fostering hate and discrimination against them. Social media has exacerbated these issues, becoming hotspots for caste and migration-related hate speech, especially in low-resource languages. This study introduces a novel dataset specifically curated to detect hate speech related to caste and migration in the low-resourced Tamil language. Using this dataset, we benchmarked the dataset with baseline experiments with the highest macro-F1 score of 0.73. We also created custom-modified models integrating custom loss functions, adapter-based fine-tuning, and parameter-efficient fine-tuning techniques. To support further research, we released the dataset, conducted a shared task, and ranked participant systems. By publicly releasing this dataset, we aimed to facilitate further study and improve the detection and mitigation of hateful content related to caste and migration on social media.
Funder
Publisher
Springer
Publisher DOI
Rights
CC BY