핵심 개념
Large language models can improve their performance by querying more capable remote models, but this poses a significant privacy risk if the local model has access to sensitive data. This work introduces privacy-preserving techniques that allow local models to leverage remote models without revealing private information.
초록
The content discusses the challenge of enabling large language models (LLMs) to collaborate and learn from each other while preserving privacy. LLMs are powerful but come with high inference costs and need to run in data centers far from local contexts where private data is available. Conversely, local models that can run on user devices have more limited capabilities.
The authors introduce the first privacy-preserving approach to cascade systems, where a local model (the student) can query a more capable remote model (the teacher) for help, without revealing any private information. They propose three methods for the student to generate queries to the teacher:
- Describing the problem the student is facing in a high-level way.
- Generating similar, but novel, unlabeled examples that the teacher can label.
- Replacing entities in the original examples to mask private information.
To evaluate privacy, the authors introduce two metrics: the entity leak metric that counts entities leaked from the original examples, and the mapping leak metric that measures how well a curious teacher with auxiliary information could map the student's queries back to the original examples.
Experiments on diverse datasets show that the authors' methods can significantly improve the student's performance compared to baselines, while minimizing privacy leakage. Method 3 (replacing entities) generally achieves the best quality results while leaking few entities, while Method 2 (generating new examples) with grouping offers the strongest privacy metrics.
통계
Two thirds of Jana's puppies are Pomeranians.
One third of the Pomeranians are girls.
There are 6 Pomeranian girls.
Raul had $87 and bought 8 comics at $4 each.
Emily had $92 and bought 4 ice cream cones at $3 each.
The pool is 14 feet wide, 25 feet long, and 4 feet deep.
The cost for the pool company to fill the pool is $0.10 per gallon.
인용구
"Cascades are a common type of machine learning systems in which a large, remote model can be queried if a local model is not able to accurately label a user's data by itself."
"Serving stacks for large language models (LLMs) increasingly use cascades due to their ability to preserve task performance while dramatically reducing inference costs."
"Applying cascade systems in situations where the local model has access to sensitive data constitutes a significant privacy risk for users since such data could be forwarded to the remote model."