First controllable SVS model using natural language prompts for singer gender, vocal range, and volume control.
InstructSing, a novel neural vocoder, can generate high-quality 48kHz singing voices while converging much faster compared to other state-of-the-art neural vocoders.
Muskits-ESPnet introduces new paradigms for singing voice synthesis by integrating pretrained audio models and exploring discrete representations, enhancing model capability and efficiency while automating the entire data processing workflow.
SongTrans is a unified model that can directly transcribe and align song lyrics and musical notes without requiring pre-processing or separate tools.
SiFiSinger is a novel end-to-end singing voice synthesis system that leverages source-filter modeling and differentiable reconstruction losses to improve pitch accuracy and overall audio quality compared to previous systems like VISinger 2.
SiFiSinger는 음성 생성의 소스-필터 메커니즘을 기반으로 하는 새로운 엔드투엔드 노래 음성 합성 시스템으로, 향상된 음질과 정확한 음높이 제어 기능을 제공합니다.