Efficient Prompt-Prompted Mixture of Experts for Large Language Model Generation
A novel training-free Mixture of Experts (MoE) method called GRIFFIN that selects unique feedforward experts at the sequence level to enable efficient generation across a variety of large language models with different non-ReLU activation functions, while preserving the original model's performance.