핵심 개념
LongLoRA presents an efficient fine-tuning approach to extend the context of large language models, reducing computational costs while maintaining performance.
초록
Abstract:
LongLoRA introduces an efficient fine-tuning method to extend context sizes of large language models (LLMs) with minimal computational cost.
It combines shifted sparse attention (S2-Attn) and improved LoRA for context extension, demonstrating strong empirical results on various tasks.
Introduction:
Large language models (LLMs) typically have pre-defined context sizes, limiting their applications.
Recent works have attempted to extend context lengths, but face computational challenges.
LongLoRA Approach:
LongLoRA efficiently extends context windows of pre-trained LLMs, combining LoRA with S2-Attn.
S2-Attn splits context length into groups and conducts attention individually, enabling information flow between groups.
Improved LoRA:
Embedding and normalization layers play a crucial role in improving LoRA for long context adaptation.
LongLoRA achieves promising results on extending context lengths for different model sizes.
Experimental Results:
LongLoRA achieves better perplexity with longer context sizes, demonstrating efficiency in extending context lengths.
The method is effective in supervised fine-tuning and shows compatibility with various LLMs and position encodings.
통계
Training on the context length of 8192 needs 16× computational costs in self-attention layers.
LongLoRA extends Llama2 7B from 4k context to 100k, or Llama2 70B to 32k on a single 8× A100 machine.
인용구
"LongLoRA combines shifted sparse attention (S2-Attn) with improved LoRA for efficient context extension."