The authors introduce pejorative word disambiguation as a preliminary step for misogyny detection, aiming to reduce the error rate of classification models on polysemic words that can serve as pejorative epithets. They build a lexicon of polysemic words with both pejorative and neutral connotations and use it to compile a novel corpus of 1,200 manually annotated Italian tweets for pejorative word disambiguation and misogyny detection.
The authors evaluate two approaches to inject pejorativity information into a misogyny detection model: concatenation and substitution. The results show that the disambiguation of potentially pejorative words leads to notable classification improvements on their corpus and two benchmark corpora in Italian. The authors also analyze the word embedding representation of the AlBERTo model and show that the encoding of lexicon words is closer to their ground-truth connotation after fine-tuning.
Furthermore, the authors qualitatively analyze several off-the-shelf instruction-tuned large language models on pejorative word disambiguation, showing that there is ample room for improvement in this task.
Naar een andere taal
vanuit de broninhoud
arxiv.org
Belangrijkste Inzichten Gedestilleerd Uit
by Aria... om arxiv.org 04-04-2024
https://arxiv.org/pdf/2404.02681.pdfDiepere vragen