This study investigates the impact of excluding emojis from Arabic social media datasets on the performance of sarcasm detection models. The researchers used three pre-trained models - AraBERT-v2, AraBERTv02-twitter, and Multi-dialect-BERT-base-Arabic - and evaluated their performance on three datasets: SemEval 2020, YouTube, and L-HSAB.
The results show that excluding emojis from the datasets consistently improves the accuracy, recall, precision, and F1-score of the models across the different datasets and models. This suggests that emojis can introduce ambiguity or noise that degrades the models' ability to accurately classify sarcastic content in Arabic social media posts.
The researchers found that the exclusion of emojis led to increased recall, particularly for the TW and MD models, indicating that emojis may have a negative impact on the models' ability to detect sarcastic instances. Precision also notably increased, especially for the V2 and MD models, when emojis were removed, reducing the number of false positives.
Similarly, the F1-score improved across all models when emojis were excluded, except for a slight drop in the V2 model, suggesting a better balance between precision and recall. These findings support the hypothesis that removing emojis from the preprocessing stage can enhance the performance of sarcasm detection models in Arabic social media content.
The study establishes new benchmarks in Arabic natural language processing and provides valuable insights for social media platforms seeking to improve their content moderation capabilities, particularly in detecting and addressing sarcastic speech.
To Another Language
from source content
arxiv.org
Principais Insights Extraídos De
by Ghalyah H. A... às arxiv.org 05-06-2024
https://arxiv.org/pdf/2405.02195.pdfPerguntas Mais Profundas