The paper presents SongTrans, a unified model for automatic transcription and alignment of song lyrics and musical notes. The key highlights are:
SongTrans consists of two modules:
SongTrans achieves state-of-the-art performance on both lyric transcription and note transcription tasks, outperforming existing specialized models.
SongTrans is the first model capable of aligning lyrics and notes, eliminating the need for pre-processing steps like vocal-accompaniment separation or forced alignment.
The authors design a data annotation pipeline to gather a large dataset of song-lyric-note pairs, which is used to train the SongTrans model.
Experiments show that SongTrans can effectively adapt to diverse song settings, including raw songs, vocals-only, and vocals with accompaniment.
Merging the authors' annotated data with the existing M4Singer dataset further improves SongTrans' performance, demonstrating the value of the custom-annotated data.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Siwei Wu, Ji... lúc arxiv.org 09-24-2024
https://arxiv.org/pdf/2409.14619.pdfYêu cầu sâu hơn