The paper presents the first security analysis of performance optimization techniques used by modern LLM systems that serve multiple users or applications simultaneously. It discovers significant information leaks from unique timing side channels introduced by these techniques.
The key findings are:
Timing side channels arise from the sharing of the semantic cache and KV cache to reduce inference costs in LLM systems. These caches can be exploited to infer proprietary system prompts or sensitive prompts from peer users.
The paper proposes novel attack strategies to exploit these side channels, enabling two attacks: prompt stealing attack and peeping neighbor attack.
Experimental validations on open-source projects and popular online LLM services demonstrate the feasibility and effectiveness of the attacks.
Preliminary solutions are proposed to mitigate these risks, such as sharing the KV cache in larger units and anonymizing privacy-related information in user inputs before semantic search.
The findings underscore the urgent need to address potential information leakage in LLM serving infrastructures as they become widely deployed.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Linke Song, ... at arxiv.org 10-01-2024
https://arxiv.org/pdf/2409.20002.pdfDeeper Inquiries