The paper advocates for a transformative paradigm in Multimodal Emotion Recognition (MER) by moving beyond the limited set of basic emotion labels. The authors argue that current approaches fail to capture the inherent complexity and subtlety of human emotional experiences, leading to limited generalizability and practicality.
To address this, the paper introduces a new paradigm called Open-vocabulary MER (OV-MER), which encompasses a broader range of emotion labels to reflect the richness of human emotions. This paradigm relaxes the label space, allowing for the prediction of arbitrary numbers and categories of emotions.
To support this transition, the authors provide a comprehensive solution that includes:
The paper highlights the importance of OV-MER in advancing emotion recognition from basic to nuanced emotions, contributing to the development of emotional AI. Experimental results demonstrate the limitations of existing Multimodal Large Language Models (MLLMs) in addressing the challenges of OV-MER, which requires the integration of multimodal clues and the capture of subtle temporal variations in emotional expression.
Till ett annat språk
från källinnehåll
arxiv.org
Djupare frågor