The NES Video-Music Database (NES-VMDB) is an extension of the Nintendo Entertainment System Music Database (NES-MDB), which contains 5,278 music pieces from 397 NES games. The NES-VMDB associates 4,070 of these pieces with short gameplay clips from the game scenes where the music is played.
To create the NES-VMDB, the authors first obtained long-play videos for 389 NES games from YouTube. They divided each video into 15-second clips and extracted the audio. Then, they used an audio fingerprinting algorithm (similar to Shazam) to automatically identify the corresponding MIDI piece from the NES-MDB dataset for each audio clip.
The authors also established a baseline generative model based on the Controllable Music Transformer (CMT) to generate NES music conditioned on gameplay videos. They trained the CMT model on the NES-VMDB MIDI pieces and generated new music by conditioning it with rhythmic features extracted from the gameplay clips.
The authors evaluated the generated music using objective metrics related to music structure, such as pitch class histogram entropy, grooving pattern similarity, pitch range, and number of notes played concurrently. The results showed that the conditional CMT model generated music that was more structurally similar to human-composed pieces compared to the unconditional model.
Additionally, the authors trained a neural classifier to predict the game genre of the generated pieces. The results indicated that the conditional CMT model was able to learn correlations between gameplay videos and game genres, but further research is needed to achieve human-level performance.
The NES-VMDB dataset and the baseline CMT model provide a foundation for future research on generating video game music from gameplay data, with the goal of supporting indie game developers in creating music for their projects.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Igor Cardoso... at arxiv.org 04-09-2024
https://arxiv.org/pdf/2404.04420.pdfDeeper Inquiries