Diffusion Model Loss-Guided Reinforcement Learning (DLPO) for Improving Text-to-Speech Diffusion Models
Reinforcement learning, specifically the novel Diffusion Model Loss-Guided Policy Optimization (DLPO), can significantly enhance the quality and naturalness of text-to-speech diffusion models by leveraging human feedback and incorporating the original diffusion model loss as a penalty during fine-tuning.