Meguenani, M.E.A., Britto Jr., A.S., & Koerich, A.L. (2024). Music Genre Classification using Large Language Models. arXiv preprint arXiv:2410.08321v1.
This paper investigates the efficacy of pre-trained large language models (LLMs) for music genre classification (MGC) in a zero-shot setting, comparing their performance to traditional deep learning architectures.
The researchers extracted feature vectors from various layers of three pre-trained audio LLMs (WavLM, HuBERT, and wav2vec 2.0) and used them to train a classification head. They compared the performance of these models with 1D and 2D convolutional neural networks (CNNs) and the audio spectrogram transformer (AST) on the GTzan dataset using 3-fold cross-validation.
This research contributes to the field of music information retrieval (MIR) by demonstrating the potential of LLMs and transformer-based models for MGC, paving the way for their application in other music-related tasks.
The study acknowledges limitations due to the GTzan dataset's integrity issues and limited genre diversity. Future research could explore fine-tuning these models on larger, more diverse datasets and investigate their effectiveness in related tasks like music recommendation or mood classification.
To Another Language
from source content
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Mohamed El A... lúc arxiv.org 10-14-2024
https://arxiv.org/pdf/2410.08321.pdfYêu cầu sâu hơn