Meguenani, M.E.A., Britto Jr., A.S., & Koerich, A.L. (2024). Music Genre Classification using Large Language Models. arXiv preprint arXiv:2410.08321v1.
This paper investigates the efficacy of pre-trained large language models (LLMs) for music genre classification (MGC) in a zero-shot setting, comparing their performance to traditional deep learning architectures.
The researchers extracted feature vectors from various layers of three pre-trained audio LLMs (WavLM, HuBERT, and wav2vec 2.0) and used them to train a classification head. They compared the performance of these models with 1D and 2D convolutional neural networks (CNNs) and the audio spectrogram transformer (AST) on the GTzan dataset using 3-fold cross-validation.
This research contributes to the field of music information retrieval (MIR) by demonstrating the potential of LLMs and transformer-based models for MGC, paving the way for their application in other music-related tasks.
The study acknowledges limitations due to the GTzan dataset's integrity issues and limited genre diversity. Future research could explore fine-tuning these models on larger, more diverse datasets and investigate their effectiveness in related tasks like music recommendation or mood classification.
Vers une autre langue
à partir du contenu source
arxiv.org
Questions plus approfondies