Multilingual Phoneme-based Speech Processing: Towards Open-Vocabulary Keyword Spotting and Forced Alignment in Any Language
Phoneme-based models can achieve strong crosslinguistic generalizability to unseen languages for open-vocabulary keyword spotting and zero-shot forced alignment.