Loading...
Thumbnail Image
Publication

Sarcasm detection in Tamil and Malayalam YouTube comments

Chakravarthi, Bharathi Raja
Citation
Chakravarthi, Bharathi Raja. (2025). Sarcasm Detection in Tamil and Malayalam YouTube Comments. Social Network Analysis and Mining, 15(1), 72. https://doi.org/10.1007/s13278-025-01486-z
Abstract
The expression of sarcasm is a standard literary device where individuals deliberately convey the opposite of what is intended. Accurately identifying sarcasm in the text can aid in comprehending a speaker’s genuine intentions and facilitate other natural language processing activities, particularly sentiment analysis and offensive language identification tasks. We created a dataset for sarcasm from YouTube comments in Dravidian languages and manually annotated them for sarcasm in two Dravidian languages: Tamil (42,244 comments) and Malayalam (18,840 comments). Subsequently, we benchmarked the dataset by comparing it with different text classifiers. Among these approaches, pre-trained transformer models performed well, achieving an accuracy of 0.798 with TamBERT for Tamil and 0.852 with the MuRIL model for Malayalam. Furthermore, we employed SHAP values (explainable AI) to help understand how individual model inputs influence predictions. We also released the dataset on CodaLab and analyzed the participants’ systems. We have presented the results of shared task participants.
Funder
Publisher
Springer
Publisher DOI
Rights
CC BY
Collections