SnapKV: An Efficient Approach to Minimize Key-Value Cache Size in Large Language Models
SnapKV is an innovative and fine-tuning-free approach that efficiently minimizes the Key-Value (KV) cache size in Large Language Models (LLMs) while maintaining comparable performance in real-world applications.