Improving Parameter Efficiency of Mixture-of-Experts Language Models through Dense Training and Sparse Inference
Employing dense training and sparse inference to enhance the parameter efficiency of Mixture-of-Experts (MoE) language models while maintaining comparable performance to dense models.