Generating Emotionally Expressive and Disfluent Speech for Conversational AI Systems
A novel speech synthesis pipeline that generates emotional and disfluent speech patterns in a zero-shot manner using a large language model, enabling more natural and relatable interactions for conversational AI systems.