Melodist: A Novel Two-Stage Model for Controllable Text-to-Song Synthesis
Melodist, a novel two-stage model, can generate songs incorporating both vocals and accompaniments from text prompts, leveraging tri-tower contrastive pretraining to learn effective text representations for controllable synthesis.