Khái niệm cốt lõi
API-based AUG-PE algorithm generates high-quality DP synthetic text without model training, outperforming traditional DP finetuning methods.
Tóm tắt
1. Abstract:
- Text data's value in ML algorithms.
- Privacy concerns with private text data.
- Importance of generating synthetic text with DP guarantee.
- Introduction of Private Evolution (PE) algorithm for DP synthetic images.
- Proposal of AUG-PE for text generation without model training.
2. Introduction:
- NLP applications rely on private text data.
- DP synthetic text as a solution.
- Challenges with DP finetuning powerful LLMs.
- Introduction of PE algorithm for DP synthetic data.
3. Method:
- Overview of PE algorithm for DP synthetic data.
- Design of AUG-PE for text generation.
- Adaptive text lengths in VARIATION_API.
- Embeddings calculation and DP nearest neighbor histogram.
- Sample selection and generation process.
4. Experiments:
- Evaluation on Yelp, OpenReview, and PubMed datasets.
- Comparison of AUG-PE with DP-FT-GENERATOR and DP-FT-DOWNSTREAM.
- Performance analysis across different privacy budgets.
- Efficiency comparison between AUG-PE and DP-FT-GENERATOR.
5. Understanding the Properties of AUG-PE:
- Analysis of text lengths, compatibility with LLMs, and behaviors under data scaling.
- Evaluation of downstream model performance with synthetic text.
6. Validating the Design of AUG-PE:
- Comparison of AUG-PE with original PE algorithm.
Thống kê
"Our results demonstrate that AUG-PE produces DP synthetic text that yields competitive utility with the SOTA DP finetuning baselines."
"AUG-PE can generate DP synthetic text that achieves comparable or even better performance than finetuning baselines in some cases."
"AUG-PE can achieve higher accuracy, especially on challenging datasets OpenReview and PubMed, outperforming DP-FT-GENERATOR by a notable margin."
Trích dẫn
"Generating synthetic replicas of private text data with a formal privacy guarantee offers a promising solution."
"AUG-PE produces DP synthetic text that yields competitive utility with the SOTA DP finetuning baselines."
"AUG-PE can effectively leverage the inherent knowledge in stronger LLMs to generate higher-quality DP synthetic texts."