This paper introduces a system development approach for the SemEval-2024 Task 3 focusing on Multimodal Emotion Cause Analysis. The proposed two-step framework involves employing Llama and GPT models to predict emotions and causes in conversations. The study highlights the importance of integrating multiple modalities like text, audio, and video to enhance emotion cause analysis efficiency. By leveraging instruction-tuning with Llama models and in-context learning with GPT models, the authors achieved significant performance gains, securing rank 4 on the leaderboard. The dataset used contains over 13,000 multimodal utterances from the TV show Friends, annotated with emotion-cause pairs. Through detailed experimentation and analysis, the authors demonstrate the effectiveness of their approaches in addressing the complexities of emotion cause analysis in natural conversation settings.
Egy másik nyelvre
a forrásanyagból
arxiv.org
Mélyebb kérdések