Enhancing Text-to-Speech Synthesis with Semantic Awareness using Llama-VITS
Llama-VITS, an innovative approach that enhances text-to-speech (TTS) synthesis by leveraging semantic embeddings from the large language model Llama2, outperforms baseline TTS models in terms of speech naturalness and emotional expressiveness.