Core Concepts
A novel training-free technique called Hierarchical cOntext MERging (HOMER) that can effectively extend the context limit of pre-trained large language models by employing a divide-and-conquer approach with hierarchical merging of context embeddings and token reduction.
Abstract
The paper presents a novel technique called Hierarchical cOntext MERging (HOMER) to address the context limit constraint faced by large language models (LLMs). The key ideas are:
Division of long input into manageable chunks:
The long input is divided into uniform chunks, with the initial and concluding parts of the prompt attached as shared prefixes and suffixes to ensure each chunk contains the necessary context.
Hierarchical merging of chunk embeddings:
The chunks are processed independently in the early transformer layers.
In the intermediate layers, adjacent chunks are progressively merged by concatenation, forming new, merged chunks.
To keep the chunk length bounded, a token reduction technique is applied before each merging stage, shortening the chunks.
Propagative refinement of lower-layer embeddings:
After hierarchical merging, the lower-layer embeddings have a trapezoidal shape, with the higher layers being more concise.
Propagative refinement is introduced to further refine the lower-layer embeddings, pruning tokens based on the decisions made in the upper layers.
Optimized computation order:
An optimized computation order is proposed, where the hierarchical merging process is conceptualized as a binary tree traversal using depth-first search.
This allows the memory requirement to scale logarithmically with the input length, making HOMER especially favorable for resource-constrained environments.
The experiments demonstrate that HOMER can effectively extend the context limit of pre-trained LLMs, maintaining high performance on downstream tasks like passkey retrieval and question answering, even for inputs up to 32 times longer than the original context limit. HOMER also shows superior language modeling fluency on long documents compared to baseline methods. The computational efficiency of HOMER is highlighted, with a 70% reduction in peak memory usage compared to the baselines.
Stats
"The context limit of LLMs has become a critical problem, and a significant challenge is addressing the quadratic computational burden of the self-attention mechanism."
"HOMER can effectively extend the context limit of pre-trained LLMs, maintaining high performance on downstream tasks like passkey retrieval and question answering, even for inputs up to 32 times longer than the original context limit."
"HOMER shows superior language modeling fluency on long documents compared to baseline methods."
"HOMER achieves a 70% reduction in peak memory usage compared to the baselines."
Quotes
"A novel training-free technique called Hierarchical cOntext MERging (HOMER) that can effectively extend the context limit of pre-trained large language models by employing a divide-and-conquer approach with hierarchical merging of context embeddings and token reduction."
"HOMER can effectively extend the context limit of pre-trained LLMs, maintaining high performance on downstream tasks like passkey retrieval and question answering, even for inputs up to 32 times longer than the original context limit."
"HOMER shows superior language modeling fluency on long documents compared to baseline methods."