핵심 개념
The author explores the use of differential privacy to protect tabular data in the context of in-context learning, proposing two frameworks - LDP-TabICL and GDP-TabICL - that offer privacy guarantees while maintaining performance.
초록
The content delves into the application of differential privacy mechanisms to safeguard tabular data used in in-context learning. It introduces two frameworks, LDP-TabICL and GDP-TabICL, evaluating their effectiveness on real-world datasets. The study highlights the importance of protecting sensitive information while maintaining model performance.
Large language models (LLMs) can adapt to new tasks through in-context learning (ICL) without retraining. Tabular data serialization enables ICL with LLMs, but poses privacy risks due to leaked information. Differential privacy (DP) is proposed as a solution to protect tabular data used in ICL.
Two DP-based frameworks are introduced: Local Differentially Private Tabular-based In-Context Learning (LDP-TabICL) and Global Differentially Private Tabular-based In-Context Learning (GDP-TabICL). These frameworks aim to generate demonstration examples for ICL while preserving the privacy of underlying tabular datasets.
Evaluation on eight real-world tabular datasets shows that DP-based ICL can maintain data privacy while achieving comparable performance to non-DP baselines. The study emphasizes the need for privacy protection in machine learning applications involving sensitive data.
Key metrics or figures:
ϵ values: 1, 5, 10, 25, 50
Dataset sizes: adult (48842 rows), bank (45211 rows), blood (748 rows), calhousing (20640 rows), car (1728 rows), diabetes (768 rows), heart (918 rows), jungle (44819 rows)
통계
ϵ values: 1, 5, 10, 25, 50
인용구
"We propose LDP-TabICL for generating demonstration examples that have formal local DP guarantees for use in tabular data classification via ICL."
"Our evaluations show that DP-based ICL can protect the privacy of the underlying tabular data while achieving comparable performance to non-LLM baselines."