The paper introduces HydraLoRA, an improved LoRA framework for efficient fine-tuning of large language models (LLMs). The key insights are:
Using a single LoRA for the entire domain dataset can lead to interference between tasks, reducing performance. Instead, deploying multiple smaller LoRA heads, each dedicated to a specific downstream task, proves more effective.
Analysis of LoRA parameters reveals that the matrix A tends to capture commonalities across domains, while matrix B adapts to domain-specific diversities. This observation motivates an asymmetric LoRA architecture.
HydraLoRA has the following components:
Initialization: HydraLoRA uses K-means clustering to adaptively identify the optimal number of intrinsic components (K) within the heterogeneous corpus.
Training: HydraLoRA employs a Mixture-of-Experts (MoE) framework, treating each LoRA as an expert. A trainable router automatically segregates training samples into the appropriate intrinsic components.
Inference: HydraLoRA merges the multiple LoRA experts in a flexible and dynamic manner using the trainable router, enabling efficient inference.
Experiments show that HydraLoRA outperforms other PEFT approaches, including those that rely on domain knowledge, across various benchmarks. It also demonstrates improved training efficiency in terms of energy consumption and latency.
Na inny język
z treści źródłowej
arxiv.org
Głębsze pytania