核心概念
Evaluation of privacy impact from non-private pre-processing in machine learning pipelines.
要約
The content discusses the overlooked privacy cost of data-dependent pre-processing in differentially private machine learning pipelines. It introduces a framework using Smooth DP and sensitivity analysis for pre-processing algorithms. The impact of common pre-processing techniques like deduplication, quantization, data imputation, and PCA on overall privacy guarantees is evaluated. The article also explores the necessity of conducting privacy analysis across the entire ML pipeline and proposes a PTR-inspired framework for unconditional privacy guarantees.
Abstract:
- Proposes a framework to evaluate additional privacy cost from non-private data-dependent pre-processing.
- Introduces technical notions: Smooth DP and sensitivity analysis for pre-processing algorithms.
- Evaluates impact of common pre-processing techniques on overall privacy guarantees.
Introduction:
- Discusses growing emphasis on user data privacy with Differential Privacy (DP).
- Highlights standard practices like data imputation, deduplication, and dimensionality reduction in ML.
- Explores challenges to privacy guarantee due to dependencies introduced by pre-processing.
Data Extraction:
- "A straightforward method to derive privacy guarantees for this pipeline is to use group privacy where the size of the group can be as large as the size of the dataset."
- "In contrast, Table 2 demonstrates that the RDP parameter ε of most private mechanisms increases to O(τε) for SRDP."
Quotations:
- "Our work shows that the overall privacy cost of pre-processed DP pipeline can be bounded with minimal degradation in privacy guarantee."
統計
A straightforward method to derive privacy guarantees for this pipeline is to use group privacy where the size of the group can be as large as the size of the dataset.
In contrast, Table 2 demonstrates that the RDP parameter ε of most private mechanisms increases to O(τε) for SRDP.
引用
"Our work shows that the overall privacy cost of pre-processed DP pipeline can be bounded with minimal degradation in privacy guarantee."