Core Concepts
LongLoRA presents an efficient fine-tuning approach to extend the context of large language models, reducing computational costs while maintaining performance.
Stats
Training on the context length of 8192 needs 16× computational costs in self-attention layers.
LongLoRA extends Llama2 7B from 4k context to 100k, or Llama2 70B to 32k on a single 8× A100 machine.
Quotes
"LongLoRA combines shifted sparse attention (S2-Attn) with improved LoRA for efficient context extension."