Cross-Speaker Style Transfer for TTS Enhanced with Singing Voice Conversion, Style Filtering, and F0 Matching
This research proposes a novel method for cross-speaker style transfer in Text-to-Speech (TTS) systems, leveraging a pre-trained singing voice conversion (SVC) model, fundamental frequency (F0) matching, and style filtering to enhance the transfer of expressive styles from a source speaker to a target speaker with limited neutral data.