核心概念
SongTrans is a unified model that can directly transcribe and align song lyrics and musical notes without requiring pre-processing or separate tools.
摘要
The paper presents SongTrans, a unified model for automatic transcription and alignment of song lyrics and musical notes. The key highlights are:
-
SongTrans consists of two modules:
- Autoregressive module: Predicts lyrics, word durations, and note numbers.
- Non-autoregressive module: Predicts note pitches and durations.
-
SongTrans achieves state-of-the-art performance on both lyric transcription and note transcription tasks, outperforming existing specialized models.
-
SongTrans is the first model capable of aligning lyrics and notes, eliminating the need for pre-processing steps like vocal-accompaniment separation or forced alignment.
-
The authors design a data annotation pipeline to gather a large dataset of song-lyric-note pairs, which is used to train the SongTrans model.
-
Experiments show that SongTrans can effectively adapt to diverse song settings, including raw songs, vocals-only, and vocals with accompaniment.
-
Merging the authors' annotated data with the existing M4Singer dataset further improves SongTrans' performance, demonstrating the value of the custom-annotated data.
統計資料
The authors gathered 58,144 songs with lyrics and sentence-level timestamps, resulting in 807,960 sentence-level song-lyric pairs.
After filtering and refinement, the authors obtained 201,649 sentence-level song-lyric pairs for training the lyric transcription model.
The authors used the refined data to train the SongTrans model, which can directly transcribe and align lyrics and notes.
引述
"SongTrans achieves SOTA performance in both lyric and note transcription tasks, and is the first model capable of aligning lyrics and notes."
"Experimental results show that the data labeled by our pipeline enhances the model's overall capability."
"Our SongTrans model can effectively label data under diverse settings, including raw songs, vocals of songs, and vocals segmented by silence."