TelME incorporates cross-modal knowledge distillation to transfer information from a powerful text-based teacher model to enhance the representations of weaker audio and visual modalities, and then fuses the multimodal features using an attention-based shifting approach to optimize emotion recognition.


coremsg

telme-a-teacher-led-multimodal-fusion-network-for-emotion-recognition-in-conversations


TelME: A Teacher-led Multimodal Fusion Network for Emotion Recognition in Conversations