Stepachev, P., Chen, P., & Haddow, B. (2024). Context and System Fusion in Post-ASR Emotion Recognition with Large Language Models. arXiv preprint arXiv:2410.03312v1.
This research investigates the optimal use of large language models (LLMs) for speech emotion recognition (SER) in a post-ASR setting, focusing on the effective utilization of conversation context and outputs from multiple ASR systems.
The researchers explored various prompting strategies for LLMs using the GenSEC Task 3 dataset, which includes ASR outputs of conversations from the IEMOCAP dataset. They experimented with different methods for selecting and ranking ASR outputs, incorporating variable conversation context lengths, and fusing outputs from multiple ASR systems. The performance of these strategies was evaluated based on their accuracy in predicting speaker emotions.
This study highlights the potential of LLMs for training-free speech emotion recognition by effectively leveraging context and multiple ASR system outputs. The proposed prompting strategies, particularly those incorporating context and system fusion, significantly improve accuracy without requiring task-specific LLM training. This approach also mitigates the risk of overfitting to speaker-specific or ASR system-specific biases.
This research contributes to the growing field of LLM applications in speech processing, demonstrating their effectiveness in a challenging task like SER. The findings have implications for developing robust and generalizable SER systems that rely on readily available LLMs without extensive training.
The study primarily focuses on a single dataset and a limited set of LLM prompting strategies. Future research could explore the generalizability of these findings to other datasets and languages. Additionally, investigating more sophisticated context modeling techniques and alternative fusion methods could further enhance SER performance.
A otro idioma
del contenido fuente
arxiv.org
Ideas clave extraídas de
by Pavel Stepac... a las arxiv.org 10-07-2024
https://arxiv.org/pdf/2410.03312.pdfConsultas más profundas