Prompt Cache: Modular Attention Reuse for Accelerating Large Language Model Inference
Prompt Cache is an approach that accelerates inference for large language models by reusing attention states across different prompts through a modular and positionally coherent prompt structure.