핵심 개념
FEDPIT proposes a novel federated algorithm that leverages large language models' in-context learning capability to generate task-specific synthetic data for training autonomously, improving federated few-shot performance while preserving privacy.
초록
Abstract:
Instruction tuning is crucial for enhancing large language models (LLMs) in generating human-aligned responses.
Federated instruction tuning (FEDIT) faces challenges due to limited instruction data and vulnerabilities to training data extraction attacks.
FEDPIT utilizes LLMs' in-context learning capability to self-generate task-specific synthetic data for training autonomously, maintaining global parameters trained on synthetic data and local parameters trained on augmented local data.
Introduction:
Instruction tuning is essential for LLMs in generating human-aligned responses.
FEDIT leverages federated learning for training instructed LLMs from multiple data owners.
Challenges include limited instruction data and training data extraction attacks.
Method:
FEDPIT incorporates self-generation and parameter-isolated training to enhance federated few-shot performance while preserving privacy.
Self-generation involves generating new instructions and responses using LLMs.
Parameter-isolated training ensures privacy protection during federated model training.
Experiment:
FEDPIT outperforms other federated algorithms in improving few-shot performance.
FEDPIT demonstrates stronger privacy-preserving capabilities against training data extraction attacks compared to FEDIT.
Related Work:
Previous research focuses on federated instruction tuning and training data extraction attacks in language models.
LLMs have been explored as training data generators to address data scarcity and privacy concerns.
통계
FEDIT assumes sufficient instruction data for model training, which is impractical in real-world applications.
FEDIT significantly neglects the training data extraction attack, which can efficiently extract training data by querying learned LLMs without any prior knowledge.
인용구
"FEDPIT utilizes LLMs' in-context learning capability to self-generate task-specific synthetic data for training autonomously."
"Our method employs parameter-isolated training to maintain global parameters trained on synthetic data and local parameters trained on augmented local data."