Core Concepts
Multi-Head Attention in LONGHEADS efficiently processes long contexts without additional training.
Abstract
The article introduces LONGHEADS, a training-free framework that enhances the ability of Large Language Models (LLMs) to process long contexts efficiently. It proposes a chunk selection strategy based on multi-head attention to handle extended sequences without additional computational load. The method achieves 100% accuracy at 128k length on passkey retrieval tasks and outperforms other methods in handling long contexts.
Abstract:
Large language models struggle with processing lengthy inputs due to attention's computational demands.
LONGHEADS proposes a chunk selection strategy based on multi-head attention to enhance LLMs' long context ability.
Achieves 100% accuracy at 128k length on passkey retrieval task.
Introduction:
Challenges of processing long contexts in LLMs due to computational costs and out-of-distribution issues.
Existing methods restrict attention window or use special operations for handling long sequences.
Method:
LONGHEADS utilizes multi-head attention to encode and generate long sequences efficiently without additional training.
Chunk selection strategy ensures relevant chunks are processed within pre-trained length.
Experiment:
Evaluation on PG19 and Proof-pile datasets for language modeling with sliding window approach.
Performance comparison with NTK, LM-Infinite, and Position Interpolation methods.
Results:
LONGHEADS maintains low perplexity even at extended context lengths compared to other methods.
Stats
LONGHEADSは、パスキー検索タスクで128kの長さで100%の精度を達成しました。
Quotes
"LONGHEADS achieves nearly 100% accuracy across context lengths from 4k to 32k on the Passkey Retrieval task."
"Experiments demonstrate that LONGHEADS enables the LLMs to directly generalize to longer sequences."