Enhancing Hybrid Autoregressive Transducer-based Automatic Speech Recognition with Internal Acoustic Model Training and Dual Blank Thresholding
Jointly training Hybrid Autoregressive Transducer (HAT) with various Connectionist Temporal Classification (CTC) objectives, including the proposed Internal Acoustic Model (IAM), improves HAT-based automatic speech recognition performance. Deploying dual blank thresholding, which combines HAT-blank and IAM-blank thresholding, along with a compatible decoding algorithm, achieves a 42-75% increase in decoding speed without significant degradation in accuracy.