Core Concepts
A metric learning approach using Siamese Networks can efficiently model conversational context to achieve state-of-the-art performance on emotion recognition in dialogues.
Abstract
This paper presents a novel approach for Emotion Recognition in Conversation (ERC) using a metric learning strategy based on Siamese Networks. The key highlights are:
The authors propose a two-step training process that combines direct label prediction through cross-entropy loss and relative label assignment through triplet loss. This allows the model to learn both individual emotion representations and the relationships between them.
The model leverages sentence embeddings and Transformer encoder layers to represent dialogue utterances and incorporate the conversational context through attention mechanisms. This contextual information is crucial for accurate emotion recognition.
The authors demonstrate that their approach, called SentEmoContext, outperforms state-of-the-art models on the DailyDialog dataset in terms of macro F1 score, achieving 57.71%. It also performs well on micro F1 score with 57.75%.
Compared to large language models like LLaMA and Falcon, the SentEmoContext model is more efficient, with a smaller size and faster training, while still achieving competitive performance.
The authors address the inherent imbalance in conversational emotion data by using a weighted data loader and loss function, as well as the triplet loss strategy, which helps the model learn robust emotion representations.
The authors also evaluate their model using the Matthews Correlation Coefficient (MCC), which provides a more comprehensive assessment of the classification quality, considering the imbalanced nature of the data.
Overall, the SentEmoContext model demonstrates the effectiveness of a metric learning approach for efficient and accurate emotion recognition in conversations, outperforming state-of-the-art models while being more lightweight and adaptable.
Stats
The DailyDialog dataset contains over 13,000 dialogues about daily life concerns, with utterance-level emotion labeling. The dataset is highly imbalanced, with the neutral label being the majority class.
Quotes
"Our main contribution lies in the development of a metric-learning training strategy for emotion recognition on utterances using the conversational context."
"We further demonstrate that our approach outperforms some of the latest state-of-the-art Large Language Models (LLMs) such as light versions of Falcon or LLaMA 2."