Innovative deep-learning architecture for speech emotion recognition using functional data models.
Proposing a method to improve speech emotion recognition accuracy by utilizing ViT and knowledge transfer to analyze frequency correlation and transfer positional information.
Investigating the reliability of SER methods and proposing a unified framework for speech emotion recognition.
EmoDistillは、音声から感情の強力な言語的および韻律的表現を学習するための新しい音声感情認識(SER)フレームワークです。
Speech emotion recognition is enhanced through the development of EMO-SUPERB, a benchmark fostering collaboration and open-source initiatives.
EmoDistill proposes a novel framework for speech emotion recognition that leverages cross-modal knowledge distillation to learn linguistic and prosodic representations from speech, achieving state-of-the-art performance.
The author introduces EMO-SUPERB to address key issues in Speech Emotion Recognition, such as reproducibility, data leakage, and leveraging typed descriptions for improved performance.
The author explores the effectiveness of SER models using real-world voice messages, highlighting the importance of combining expert and non-expert annotations for improved results.