Emphatic Expressive Text-to-Speech with Linguistic Information for Improved Expressiveness and Naturalness
EE-TTS, a novel text-to-speech model, leverages multi-level linguistic information from syntax and semantics to generate highly expressive speech with appropriate emphasis, outperforming baseline systems in both expressiveness and naturalness.