The paper presents SongTrans, a unified model for automatic transcription and alignment of song lyrics and musical notes. The key highlights are:
SongTrans consists of two modules:
SongTrans achieves state-of-the-art performance on both lyric transcription and note transcription tasks, outperforming existing specialized models.
SongTrans is the first model capable of aligning lyrics and notes, eliminating the need for pre-processing steps like vocal-accompaniment separation or forced alignment.
The authors design a data annotation pipeline to gather a large dataset of song-lyric-note pairs, which is used to train the SongTrans model.
Experiments show that SongTrans can effectively adapt to diverse song settings, including raw songs, vocals-only, and vocals with accompaniment.
Merging the authors' annotated data with the existing M4Singer dataset further improves SongTrans' performance, demonstrating the value of the custom-annotated data.
A otro idioma
del contenido fuente
arxiv.org
Ideas clave extraídas de
by Siwei Wu, Ji... a las arxiv.org 09-24-2024
https://arxiv.org/pdf/2409.14619.pdfConsultas más profundas