This paper presents a novel approach for Emotion Recognition in Conversation (ERC) using a metric learning strategy based on Siamese Networks. The key highlights are:
The authors propose a two-step training process that combines direct label prediction through cross-entropy loss and relative label assignment through triplet loss. This allows the model to learn both individual emotion representations and the relationships between them.
The model leverages sentence embeddings and Transformer encoder layers to represent dialogue utterances and incorporate the conversational context through attention mechanisms. This contextual information is crucial for accurate emotion recognition.
The authors demonstrate that their approach, called SentEmoContext, outperforms state-of-the-art models on the DailyDialog dataset in terms of macro F1 score, achieving 57.71%. It also performs well on micro F1 score with 57.75%.
Compared to large language models like LLaMA and Falcon, the SentEmoContext model is more efficient, with a smaller size and faster training, while still achieving competitive performance.
The authors address the inherent imbalance in conversational emotion data by using a weighted data loader and loss function, as well as the triplet loss strategy, which helps the model learn robust emotion representations.
The authors also evaluate their model using the Matthews Correlation Coefficient (MCC), which provides a more comprehensive assessment of the classification quality, considering the imbalanced nature of the data.
Overall, the SentEmoContext model demonstrates the effectiveness of a metric learning approach for efficient and accurate emotion recognition in conversations, outperforming state-of-the-art models while being more lightweight and adaptable.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Barb... lúc arxiv.org 04-18-2024
https://arxiv.org/pdf/2404.11141.pdfYêu cầu sâu hơn