Unlocking Emergent Modularity in Large Language Models to Enhance Downstream Generalization
Emergent modularity spontaneously arises in pre-trained language models, and unlocking this modularity through fine-tuning as Emergent Mixture-of-Experts (EMoE) can improve downstream in-domain and out-of-domain generalization.