Efficient Neural Codec Language Modeling for Zero-Shot Text-to-Speech Synthesis
CLaM-TTS employs probabilistic residual vector quantization to achieve superior compression in token length and enable a language model to generate multiple tokens at once, thereby enhancing the efficiency of zero-shot text-to-speech synthesis.