An algorithm-system co-design called Pre-gated MoE that enables fast and memory-efficient deployment of Mixture-of-Experts (MoE) based large language models.