Recent advancements in singing-voice-synthesis (SVS) have improved audio quality but lack explicit style attribute control. Prompt-Singer introduces attribute control using natural language prompts for singer gender, vocal range, and volume. The model architecture is based on a decoder-only transformer with a multi-scale hierarchy. Challenges include decoupling melody and vocal range, textual representation tailored for singing style descriptions, and data scarcity due to limited datasets. Experiments show favorable controlling ability and audio quality.
To Another Language
from source content
arxiv.org
ข้อมูลเชิงลึกที่สำคัญจาก
by Yongqi Wang,... ที่ arxiv.org 03-19-2024
https://arxiv.org/pdf/2403.11780.pdfสอบถามเพิ่มเติม