Emotion-Aware Neural Transducer for Fine-Grained Speech Emotion Recognition
The authors propose Emotion Neural Transducer (ENT) and its factorized variant (FENT) to enable fine-grained speech emotion recognition by jointly modeling acoustic and linguistic information through neural transducer architecture.