Core Concepts
Differential privacy mechanisms can protect tabular data in in-context learning, ensuring privacy while maintaining performance.
Abstract
This article explores the application of differential privacy (DP) to safeguard tabular data used in in-context learning (ICL). It introduces Local Differentially Private Tabular-based In-Context Learning (LDP-TabICL) and Global Differentially Private Tabular-based In-Context Learning (GDP-TabICL) frameworks, evaluating their performance on real-world datasets. LDP-TabICL uses randomized response for privacy, while GDP-TabICL relies on global DP mechanisms. The study shows that DP-based ICL can maintain data privacy while achieving comparable performance to non-LLM baselines, especially under high privacy regimes.
Index
- Abstract
- Introduction to Large Language Models (LLMs) and In-Context Learning (ICL)
- Use of Tabular Data in ICL
- Risks of Using Tabular Data in LLMs
- Mitigating Privacy Risks with Differential Privacy (DP)
- Proposed Methods: LDP-TabICL and GDP-TabICL
- Experimental Evaluation
- Results and Analysis
- Conclusion
Stats
"We formulate two private ICL frameworks with provable privacy guarantees in both the local (LDP-TabICL) and global (GDP-TabICL) DP scenarios via injecting noise into individual records or group statistics, respectively."
"Our evaluations show that DP-based ICL can protect the privacy of the underlying tabular data while achieving comparable performance to non-LLM baselines, especially under high privacy regimes."
Quotes
"The ease of usage and cost-benefit has motivated several organizations to integrate LLMs into their operations and services to supplement their private data with knowledge from the large corpus of texts that LLMs are trained on."
"Recent research has demonstrated that LLMs can leak information from the large text corpus used to train them and from the smaller pool of domain-specific data used to fine-tune them."
"We propose LDP-TabICL for generating demonstration examples that have formal local DP guarantees for use in tabular data classification via ICL."