Retrieval Heads: The Key to Precise Information Retrieval in Long-Context Language Models
Retrieval heads, a small set of attention heads within long-context language models, are responsible for retrieving relevant information from the input context and redirecting it to the output, enabling precise factual responses.