The paper advocates for a transformative paradigm in Multimodal Emotion Recognition (MER) by moving beyond the limited set of basic emotion labels. The authors argue that current approaches fail to capture the inherent complexity and subtlety of human emotional experiences, leading to limited generalizability and practicality.
To address this, the paper introduces a new paradigm called Open-vocabulary MER (OV-MER), which encompasses a broader range of emotion labels to reflect the richness of human emotions. This paradigm relaxes the label space, allowing for the prediction of arbitrary numbers and categories of emotions.
To support this transition, the authors provide a comprehensive solution that includes:
The paper highlights the importance of OV-MER in advancing emotion recognition from basic to nuanced emotions, contributing to the development of emotional AI. Experimental results demonstrate the limitations of existing Multimodal Large Language Models (MLLMs) in addressing the challenges of OV-MER, which requires the integration of multimodal clues and the capture of subtle temporal variations in emotional expression.
Til et annet språk
fra kildeinnhold
arxiv.org
Viktige innsikter hentet fra
by Zheng Lian, ... klokken arxiv.org 10-03-2024
https://arxiv.org/pdf/2410.01495.pdfDypere Spørsmål