Scaling Large Speech Recognition Models with Mixture-of-Experts: Achieving Dense-1B Accuracy at Dense-225M Inference Cost
A simple and effective approach to scaling speech recognition models using Mixture-of-Experts (MoE) layers, achieving Dense-1B level accuracy with Dense-225M level inference cost, while also enabling streaming capabilities.