Improving Automatic Speech Recognition with Pronunciation-Aware Transducer Models
Transducers with Pronunciation-aware Embeddings (PET) can improve speech recognition accuracy by incorporating shared components in the decoder embeddings for text tokens with the same or similar pronunciations.