This paper investigates the joint sentiment analysis of music by combining information from both audio and lyrics. The authors explore two main approaches for emotion classification in music: the categorical approach, which categorizes emotions into distinct groups, and the dimensional approach, which maps emotions onto a multi-dimensional space.
The study utilizes two datasets - the VA dataset and the MIREX-like dataset - that provide annotations for both audio and lyrics. For the audio modality, the authors employ a model from the USC SAIL team that performed well in the "Emotions and Themes in Music" MediaEval task. For the text modality, they evaluate four different models from the Hugging Face platform, including a specialized lyrics sentiment model and models adapted for poetic language.
The individual model experiments show that text-based models slightly outperform the audio-only model in classifying negative emotions, while the audio model is better at identifying positive emotions. To leverage the strengths of both modalities, the authors investigate three fusion methods: class selection based on highest probability, averaging predictions, and a weighted combination. The weighted approach, with a 60% audio and 40% text ratio, emerges as the most effective strategy, improving performance across various metrics compared to the individual modalities.
The analysis reveals that the combination of audio and text data can better capture the complex interplay of musical elements and lyrical content in expressing sentiment, leading to more nuanced and comprehensive music emotion recognition. The authors also discuss the challenges in this field, such as the lack of standardized emotion taxonomies, subjective perception of music, and the scarcity of high-quality bimodal datasets. They propose future research directions, including the development of novel multimodal models that can jointly process text and audio inputs to further advance the field of music sentiment analysis.
To Another Language
from source content
arxiv.org
Viktige innsikter hentet fra
by Lea Schaab,A... klokken arxiv.org 05-06-2024
https://arxiv.org/pdf/2405.01988.pdfDypere Spørsmål