This study investigates whether and to what extent state-of-the-art music generation models, such as Jukebox and MusicGen, encode fundamental Western music theory concepts within their internal representations. The authors introduce a synthetic dataset called SynTheory, which consists of seven datasets that each capture an isolated music theory concept, including tempo, time signatures, notes, intervals, scales, chords, and chord progressions.
The authors use a probing approach to assess the degree to which these music theory concepts are discernible in the internal representations of the music generation models. They train probing classifiers on the embeddings extracted from different layers and components of the models, including the audio codecs and decoder language models. The probing results suggest that music theory concepts are indeed encoded within these models, with the degree of encoding varying across different concepts, model sizes, and model layers.
The authors find that the Jukebox model performs consistently well across all SynTheory tasks, while the MusicGen Decoder Language Models also exhibit competitive performance. Interestingly, the smaller MusicGen model outperforms its larger counterparts, suggesting that the smaller model may have developed a more efficient encoding of music theory concepts within its representations.
The authors also benchmark the music generation models against handcrafted audio features, such as mel spectrograms, MFCC, and chroma. The results show that the pretrained music decoder language models generally outperform the handcrafted features, but the aggregate handcrafted features perform comparably to the MusicGen Decoder Language Models.
The insights from this study can inform future efforts towards more detailed and lower-level control in music generation, as well as the development of more challenging probing datasets to further understand the relationship between symbolic and audio-based music generation.
Til et annet språk
fra kildeinnhold
arxiv.org
Viktige innsikter hentet fra
by Megan Wei, M... klokken arxiv.org 10-02-2024
https://arxiv.org/pdf/2410.00872.pdfDypere Spørsmål