The authors investigated the generalizability of sarcasm detection models by testing their performance on different sarcasm datasets. They found that:
For intra-dataset predictions, models performed best on the Sarcasm Corpus V2 dataset, followed by the Conversation Sarcasm Corpus (CSC) with third-party labels. Models performed worst on the iSarcasmEval dataset, which only had author labels.
For cross-dataset predictions, most models failed to generalize well, implying that one type of dataset cannot represent all the diverse styles and domains of sarcasm.
Models fine-tuned on the new CSC dataset showed the highest generalizability to other datasets, despite not being the largest dataset. The authors attribute this to the psycholinguistically-motivated data collection methodology used for CSC.
The source of sarcasm labels (author vs. third-party) consistently affected model performance, with third-party labels leading to better results.
A post-hoc analysis revealed that different datasets contain sarcasm with distinct linguistic properties, such as negative emotions, social issues, and religious references, which the models become accustomed to during fine-tuning.
The authors conclude that future sarcasm research should account for the broad scope and diversity of sarcasm, rather than focusing on a narrow definition, to build more robust sarcasm detection systems.
Naar een andere taal
vanuit de broninhoud
arxiv.org
Belangrijkste Inzichten Gedestilleerd Uit
by Hyewon Jang,... om arxiv.org 04-10-2024
https://arxiv.org/pdf/2404.06357.pdfDiepere vragen