A Multilingual Text-Independent Phone-to-Audio Alignment System Using Self-Supervised Learning and Knowledge Transfer
A novel approach for text-independent phone-to-audio alignment using self-supervised learning, representation learning, and knowledge transfer, which outperforms the state-of-the-art and is adaptable to diverse English accents and other languages.