Główne pojęcia
First controllable SVS model using natural language prompts for singer gender, vocal range, and volume control.
Streszczenie
Recent advancements in singing-voice-synthesis (SVS) have improved audio quality but lack explicit style attribute control. Prompt-Singer introduces attribute control using natural language prompts for singer gender, vocal range, and volume. The model architecture is based on a decoder-only transformer with a multi-scale hierarchy. Challenges include decoupling melody and vocal range, textual representation tailored for singing style descriptions, and data scarcity due to limited datasets. Experiments show favorable controlling ability and audio quality.
Statystyki
Experiments show that our model achieves favorable controlling ability and audio quality.
The best R-FFE and MOS values are 0.09 and 3.90.
Fine-tuning the text encoders leads to a considerable improvement in controlling accuracy.
Cytaty
"We propose Prompt-Singer, the first SVS method that enables attribute controlling on singer gender, vocal range, and volume with natural language."
"Our contributions are summarized as proposing the first controllable SVS model with natural language prompts."
"Our model achieves remarkable controlling capability and audio quality on prompt singing-voice-synthesis."