Multi-Modal Pre-Training and Mid-Training Strategies for Improved Automatic Speech Recognition
Combining multi-modal pre-training with a novel mid-training translation task leads to significant improvements in automatic speech recognition performance.