Leveraging Text-to-Speech Knowledge for Robust Open Vocabulary Keyword Spotting
A novel framework that leverages intermediate representations extracted from a pre-trained text-to-speech (TTS) model to enhance the performance of open vocabulary keyword spotting.