멀티버스는 소량의 데이터만으로도 기존 대규모 데이터 기반 TTS 모델에 필적하는 성능을 달성하는 제로샷 TTS 시스템으로, 음성 스타일 전이 기능까지 갖추고 있다.
MultiVerse is a novel text-to-speech (TTS) system that achieves high-quality, zero-shot, multi-task performance in various conditions (including cross-lingual and speech style transfer) with significantly less training data than traditional data-driven approaches, by leveraging source-filter theory-based disentanglement and a hybrid prosody modeling approach.
A novel zero-shot text-to-speech model that utilizes multi-scale acoustic prompts, including a style prompt to capture personal speaking style and a timbre prompt to preserve the target speaker's voice characteristics, outperforming state-of-the-art language model-based approaches in terms of naturalness and speaker similarity.