The key highlights and insights from the content are:
Machine unlearning techniques have been proposed to remove the influence of training data from machine learning models, in order to fulfill the "right to be forgotten". However, existing studies mainly focus on the efficiency and efficacy of unlearning methods, while neglecting the investigation of privacy vulnerabilities during the unlearning process.
The authors propose unlearning inversion attacks that can reveal the feature and label information of unlearned data by exploiting the difference between the original and unlearned models.
For feature recovery, the server-based attack can leverage the difference in model parameters to reconstruct the features of the unlearned data, especially in the case of approximate unlearning.
For label inference, the user-based attack can leverage the difference in prediction outputs between the original and unlearned models to infer the label of the unlearned data, even in the case of exact unlearning.
Extensive experiments on benchmark datasets and model architectures validate the effectiveness of the proposed unlearning inversion attacks in uncovering the private information of unlearned data.
The authors also discuss three potential defense methods, but they lead to unacceptable trade-offs between defense effectiveness and utility loss.
The study highlights the need for careful design of mechanisms for implementing unlearning without leaking the information of the unlearned data.
Başka Bir Dile
kaynak içeriğinden
arxiv.org
Önemli Bilgiler Şuradan Elde Edildi
by Hongsheng Hu... : arxiv.org 04-05-2024
https://arxiv.org/pdf/2404.03233.pdfDaha Derin Sorular