Boosting Multi-Speaker Expressive Speech Synthesis with Semi-supervised Contrastive Learning
A novel contrastive learning-based approach to extract disentangled style, emotion, and speaker representations from speech, enabling multi-speaker expressive speech synthesis.