toplogo
Войти

Privacy Backdoors: Stealing Training Data from Corrupted Pretrained Models


Основные понятия
An attacker can tamper with the weights of a pretrained machine learning model to create "privacy backdoors" that enable the reconstruction of individual training samples used to finetune the model.
Аннотация

The paper introduces a new attack vector in the machine learning supply chain, where an attacker can compromise the privacy of finetuning data by tampering with the weights of a pretrained model. The key insights are:

  1. The attacker can create "data traps" in the pretrained model that selectively write individual training samples to the model weights during finetuning. These trapped data points can then be extracted by reading from the finetuned model's weights.

  2. The authors design a "single-use" backdoor construction that activates on a specific input, writes that input to the weights, and then becomes inactive to prevent further alteration of the weights. This allows the backdoor to survive the entire finetuning process.

  3. The authors demonstrate the effectiveness of their privacy backdoors on popular transformer models like ViT and BERT, showing they can reconstruct dozens of finetuning examples across various downstream tasks.

  4. The paper also explores black-box attacks, where the attacker can only query the finetuned model. They show this enables perfect membership inference attacks and data extraction using model stealing techniques.

  5. Finally, the authors use their backdoors to mount tight end-to-end attacks on differentially private training, challenging the common assumption that the privacy guarantees of DP-SGD are overly conservative in practice.

Overall, the work highlights a crucial and overlooked supply chain attack on machine learning privacy, and emphasizes the need for more stringent privacy protections when operating with untrusted shared models.

edit_icon

Настроить сводку

edit_icon

Переписать с помощью ИИ

edit_icon

Создать цитаты

translate_icon

Перевести источник

visual_icon

Создать интеллект-карту

visit_icon

Перейти к источнику

Статистика
Practitioners commonly download pretrained machine learning models from open repositories and finetune them to fit specific applications. The authors show that this practice introduces a new risk of privacy backdoors, where an attacker can tamper with a pretrained model's weights to fully compromise the privacy of the finetuning data. The authors demonstrate their privacy backdoor attacks on popular transformer models like ViT and BERT, showing they can reconstruct dozens of finetuning examples across various downstream tasks. The authors further show that their backdoors enable tight end-to-end attacks on differentially private training, challenging the common assumption that the privacy guarantees of DP-SGD are overly conservative in practice.
Цитаты
"By tampering with a pretrained model's weights, an attacker can fully compromise the privacy of the finetuning data." "Our backdoor attacks create 'data traps' that directly write some data points to the model weights during finetuning. The trapped data can then be extracted by reading from the finetuned model's weights." "We further show that our backdoors enable simpler, perfect membership inference attacks, which infer with 100% accuracy whether a data point was used for training."

Ключевые выводы из

by Shan... в arxiv.org 04-02-2024

https://arxiv.org/pdf/2404.00473.pdf
Privacy Backdoors

Дополнительные вопросы

How can the security and privacy of pretrained models be improved to mitigate the risk of such supply chain attacks?

To enhance the security and privacy of pretrained models and mitigate the risk of supply chain attacks like privacy backdoors, several measures can be implemented: Model Verification: Implement robust verification processes to ensure the integrity of pretrained models before they are shared on repositories. This can involve thorough code reviews, testing for vulnerabilities, and validation of model behavior. Model Transparency: Enhance transparency by providing detailed documentation on the training process, data sources, and any modifications made to the pretrained model. This can help users understand the model's origins and potential risks. Secure Repositories: Utilize secure repositories with strict access controls and authentication mechanisms to prevent unauthorized tampering with pretrained models. Regular Audits: Conduct regular security audits and assessments of pretrained models to identify and address any vulnerabilities or backdoors that may have been introduced. Data Protection: Implement strong data protection measures to safeguard sensitive training data used in the development of pretrained models. This can include encryption, access controls, and data minimization techniques. End-to-End Encryption: Utilize end-to-end encryption techniques to protect the communication and transfer of pretrained models between different entities in the supply chain. User Education: Educate users and developers on best practices for securely handling pretrained models, including verifying the source, implementing secure configurations, and monitoring for any suspicious activities.

How might the insights from this work on privacy backdoors be applied to improve the robustness and security of differentially private machine learning algorithms?

The insights gained from the research on privacy backdoors can be leveraged to enhance the robustness and security of differentially private machine learning algorithms in the following ways: Adversarial Testing: Apply adversarial testing techniques to differentially private models to identify potential vulnerabilities and backdoors that could compromise privacy guarantees. Backdoor Detection: Develop mechanisms to detect and mitigate privacy backdoors in differentially private models, similar to the techniques used in this research to identify and exploit backdoors in pretrained models. Privacy-Aware Training: Incorporate privacy-aware training strategies that consider the possibility of backdoors and aim to minimize their impact on the model's privacy guarantees. Model Verification: Implement rigorous verification processes to ensure the integrity and privacy-preserving properties of differentially private models, including testing for vulnerabilities and conducting privacy audits. Secure Model Sharing: Establish secure protocols for sharing differentially private models to prevent unauthorized modifications or tampering that could introduce privacy vulnerabilities. Continuous Monitoring: Implement continuous monitoring and anomaly detection mechanisms to identify any unusual behavior or deviations from expected privacy guarantees in differentially private models. By applying these insights, the security and privacy of differentially private machine learning algorithms can be strengthened, ensuring that they remain robust and resistant to privacy attacks.

What other types of backdoors or vulnerabilities might exist in the machine learning supply chain beyond the privacy attacks explored in this paper?

Beyond privacy attacks like the privacy backdoors explored in the paper, other types of backdoors and vulnerabilities that could exist in the machine learning supply chain include: Model Poisoning: Attackers could inject malicious data into the training set to manipulate the behavior of the model, leading to biased or incorrect predictions. Trojan Models: Malicious actors could embed trojan triggers in the model architecture, causing it to behave unexpectedly when specific conditions are met. Model Stealing: Adversaries could attempt to steal the architecture or parameters of a pretrained model, compromising its intellectual property and potentially revealing sensitive information. Adversarial Examples: Models could be vulnerable to adversarial examples, where small, imperceptible changes to input data lead to incorrect predictions, potentially causing security breaches. Data Leakage: Inadequate data protection measures could result in data leakage, where sensitive information from the training data is exposed through the model's predictions. Model Inversion: Attackers could attempt to reverse-engineer the model to extract sensitive training data or infer information about individuals represented in the data. By addressing these potential vulnerabilities and implementing robust security measures throughout the machine learning supply chain, organizations can better protect their models and data from malicious exploitation.
0
star