Multimodal Emotion-Cause Pair Extraction in Conversations with Specialized Emotion Encoders and Multimodal Language Models
The MER-MCE framework leverages specialized emotion encoders for text, audio, and visual modalities, as well as Multimodal Language Models, to effectively identify emotions and their underlying causes in multimodal conversational data.