المفاهيم الأساسية
This research paper introduces a novel task called Affective State Identification (ASI) for identifying a wide range of emotions and moods in text, moving beyond limited emotion categories. It presents a new benchmark dataset, MASIVE, collected from Reddit, containing over 1,000 unique affective state labels in English and Spanish. The authors demonstrate that fine-tuned smaller language models outperform larger language models on ASI tasks and that training on MASIVE improves performance on traditional emotion detection benchmarks. The paper highlights the importance of native-language data for accurate affective state identification and suggests future research directions for this new field.
الإحصائيات
88% of automatically collected English labels and 72% of Spanish labels were validated as reflecting affective states by human annotators.
Annotators identified 65.8% of English and 81.5% Spanish affective states as moods rather than emotions.
58.8% of English and 38.5% of Spanish affective states were identified as figurative.
Fine-tuned mT5 achieves higher macro-F1 scores on existing emotion classification datasets after pre-training on MASIVE.
Machine translation of training or evaluation data leads to a significant drop in performance, with an average similarity reduction of 27% in English and 36% in Spanish.