Enhancing Real-Time Text-to-Speech Synthesis Efficiency through Consistency Models and Weighted Samplers
CM-TTS, a novel architecture based on consistency models, achieves high-quality speech synthesis in fewer steps without adversarial training or pre-trained model dependencies. Weighted samplers are introduced to mitigate biases during model training.