The paper advocates for a transformative paradigm in Multimodal Emotion Recognition (MER) by moving beyond the limited set of basic emotion labels. The authors argue that current approaches fail to capture the inherent complexity and subtlety of human emotional experiences, leading to limited generalizability and practicality.
To address this, the paper introduces a new paradigm called Open-vocabulary MER (OV-MER), which encompasses a broader range of emotion labels to reflect the richness of human emotions. This paradigm relaxes the label space, allowing for the prediction of arbitrary numbers and categories of emotions.
To support this transition, the authors provide a comprehensive solution that includes:
The paper highlights the importance of OV-MER in advancing emotion recognition from basic to nuanced emotions, contributing to the development of emotional AI. Experimental results demonstrate the limitations of existing Multimodal Large Language Models (MLLMs) in addressing the challenges of OV-MER, which requires the integration of multimodal clues and the capture of subtle temporal variations in emotional expression.
In un'altra lingua
dal contenuto originale
arxiv.org
Approfondimenti chiave tratti da
by Zheng Lian, ... alle arxiv.org 10-03-2024
https://arxiv.org/pdf/2410.01495.pdfDomande più approfondite