NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
NaturalSpeech 3 introduces a novel factorized diffusion model to generate natural speech in a zero-shot manner by disentangling speech attributes. The approach enhances speech quality, similarity, prosody, and intelligibility.