Khái niệm cốt lõi
LoRAMoE introduces a novel framework to address the conflict between improving LLM performance on downstream tasks and preventing world knowledge forgetting during SFT.
Tóm tắt
Abstract: LoRAMoE framework proposed to address world knowledge forgetting in LLMs during SFT.
Introduction: Supervised fine-tuning crucial for LLMs, but large-scale data can damage world knowledge.
Data Extraction:
"Experimental results show that, as the instruction data increases, LoRAMoE can significantly improve the ability to process downstream tasks, while maintaining the world knowledge stored in the LLM."
Motivation: Large-scale SFT can cause irreversible damage to world knowledge in LLMs.
Architecture: LoRAMoE utilizes LoRAs as experts and a router network to maintain world knowledge and enhance downstream task performance.
Localized Balancing Constraint: Balances expert utilization between world knowledge tasks and other downstream tasks.
Experiments: LoRAMoE outperforms direct SFT and LoRA tuning in maintaining world knowledge and improving multitasking abilities.
Sensitivity Analysis: Performance of LoRAMoE stable with varying number of experts and LoRA rank.
Visualizing the Experts Utilization: Experts specialized for world knowledge and downstream tasks show distinct utilization patterns.
Thống kê
"Experimental results show that, as the instruction data increases, LoRAMoE can significantly improve the ability to process downstream tasks, while maintaining the world knowledge stored in the LLM."