Centrala begrepp
Optimizing joint CNN and SeqNN architectures using DARTS enhances SER performance.
Sammanfattning
The study introduces emoDARTS, a DARTS-optimized joint CNN and SeqNN architecture for improved Speech Emotion Recognition (SER). The paper discusses the challenges in designing optimal DL architectures for SER and the potential of Neural Architecture Search (NAS) to automate this process. Differentiable Architecture Search (DARTS) is highlighted as an efficient method for discovering optimal models. The study demonstrates that emoDARTS outperforms conventional CNN-LSTM models by allowing DARTS to select the best layer order inside the DARTS cell. The research extends beyond the IEMOCAP dataset to include MSP-IMPROV and MSP-Podcast datasets, showcasing generalization capabilities.
1. Introduction
- SER importance in understanding emotions in speech.
- Advances in DL improve SER performance.
- NAS offers automated model optimization.
- DARTS optimizes joint CNN and SeqNN architecture.
2. Related Work
- Limited literature on using DARTS and NAS for SER.
- Studies combining CNN and LSTM for SER.
3. emoDARTS Framework
- Utilizes DARTS to optimize joint CNN and SeqNN architecture.
- Detailed explanation of how DARTS optimizes network architecture.
4. Experimental Setup
- Dataset selection: IEMOCAP, MSP-IMPROV, MSP-Podcast.
- Feature selection: MFCC input features.
5. Evaluation
- Comparison of emoDARTS with baseline models developed without DARTS.
6. Discussion
- Challenges faced: GPU memory utilization, converging to local minima, high standard deviation in results.
Statistik
Neural Architecture Search (NAS) can help discover optimal neural networks for a given task.
Differentiable Architecture Search (DARTS) reduces computation time significantly from previous methods.
emoDARTS outperforms conventional CNN-LSTM models by allowing DARTS to choose optimal layer orders.