The authors propose models that tackle the task of Multimodal Emotion Cause Pair Extraction as an utterance labeling and a sequence labeling problem, performing a comparative study of these models using different encoders and architectures.
텍스트, 오디오, 비디오를 통합한 감정 원인 분석의 중요성