Core Concepts
Goldfish is a novel framework for efficient and effective federated unlearning that enables the removal of a user's data from a trained machine learning model without the need for complete retraining.
Abstract
The Goldfish framework consists of four key modules:
Basic Model: Goldfish utilizes a teacher-student knowledge distillation approach to enable fast retraining. The teacher model encompasses knowledge from the full dataset, while the student model selectively learns from the remaining dataset after data deletion, effectively forgetting the removed data.
Loss Function: Goldfish introduces a novel loss function that incorporates three components - hard loss (accounting for the discrepancy between predictions and actual labels on the remaining dataset), confusion loss (reducing the bias of predicted results on the removed dataset), and distillation loss (improving the generalization of the student model).
Optimization: Goldfish employs two techniques to enhance efficiency - early termination of training guided by empirical risk, and data partitioning into small shards, which enables retraining only on the shards containing deleted data.
Extension: Goldfish includes mechanisms to address client data heterogeneity through adaptive distillation temperature, and to handle variations in model quality through adaptive weight assignment during aggregation.
The comprehensive experiments on public datasets demonstrate that Goldfish can effectively resist backdoor attacks while exhibiting better efficiency and accuracy compared to state-of-the-art methods.
Stats
The size of the remaining dataset Dc^r is much larger than the size of the removed dataset Dc^f (|Dc^r| >> |Dc^f|).
Quotes
"To address the challenge of low validity in existing machine unlearning algorithms, we propose a novel loss function. It takes into account the loss arising from the discrepancy between predictions and actual labels in the remaining dataset. Simultaneously, it takes into consideration the bias of predicted results on the removed dataset. Moreover, it accounts for the confidence level of predicted results."
"To enhance efficiency, we adopt knowledge distillation technique in basic model and introduce an optimization module that encompasses the early termination mechanism guided by empirical risk and the data partition mechanism."