Universal Speaker-Adaptive Text-to-Speech Approach Outperforms State-of-the-Art Methods Across Native and Non-Native English Speakers
The proposed USAT framework unifies both zero-shot and few-shot speaker adaptation strategies, outperforming state-of-the-art methods in synthesizing natural and speaker-similar speech for both native and non-native English speakers.