Cross-Utterance Conditioned Variational Autoencoder for Enhancing Prosody and Naturalness in Speech Synthesis
The proposed Cross-Utterance Conditioned Variational Autoencoder (CUC-VAE) framework leverages contextual information from surrounding utterances to generate more natural and expressive speech by modeling prosody.