Core Concepts
A two-stage pipeline combining a fine-tuned LLM for emotion classification and a BiLSTM-based network for cause extraction achieves state-of-the-art performance on the SemEval-2024 Task 3 "The Competition of Multimodal Emotion Cause Analysis in Conversations".
Abstract
The paper presents a system developed by the PetKaz team for the SemEval-2024 Task 3 "The Competition of Multimodal Emotion Cause Analysis in Conversations". The task focuses on extracting emotion-cause pairs from dialogues.
The proposed approach consists of two stages:
Emotion classification: The authors fine-tune GPT-3.5 to classify utterances into one of the seven emotion categories (neutral, anger, disgust, fear, joy, sadness, surprise). The model considers both the target utterance and the preceding utterance to make the classification.
Cause extraction: The authors use a BiLSTM-based network to detect the causal utterances for non-neutral utterances. The model takes into account the utterance embeddings, speaker information, and the emotion label to predict whether a previous utterance is the cause of the current emotional utterance.
The authors score 2nd out of 15 teams in the Subtask 1 "Textual Emotion-Cause Pair Extraction in Conversations" based on the weighted-average proportional F1 score of 0.264, demonstrating the effectiveness of their approach.
The paper also provides an extensive analysis of the model's performance. Key insights include:
The emotion classifier struggles the most with correctly identifying disgust, likely due to the class imbalance in the dataset.
The cause extractor performs better when the cause is closer to the emotional utterance.
The authors observe instances where emotions appear before their causes, suggesting the need to revisit the definition of "cause" in dialogue contexts.
Overall, the authors highlight the complexity of accurately identifying emotions and their causes in conversational data, and provide suggestions for future improvements, such as enhancing data annotation and speaker representations.
Stats
91% of emotions have corresponding causes, and one emotion may be triggered by multiple causes in different utterances.
16% of emotions cause several different emotions.
The training set contains 1,236 dialogs with 12,346 utterances and 8,565 emotion-cause pairs.
The development set contains 138 dialogs with 1,273 utterances and 799 emotion-cause pairs.
Quotes
"Recognizing the emotional implications of an utterance provides a deeper understanding of dialog, enabling the development of more human-like dialog systems."
"We believe that this part of the task can be more accurately defined as a causal emotion entailment."