Khái niệm cốt lõi
Counterfactual explanations can be exploited to perform efficient model extraction attacks on machine learning as a service platforms, and incorporating differential privacy into the counterfactual generation process can mitigate such attacks while preserving the quality of explanations.
Tóm tắt
The content discusses the potential vulnerabilities of machine learning as a service (MLaaS) platforms, particularly in relation to model extraction attacks (MEA) facilitated by the provision of explainable AI (XAI) techniques, such as counterfactual (CF) explanations.
Key highlights:
MLaaS platforms are increasingly offering explanations, including CFs, alongside model predictions, which can be exploited by attackers to perform MEA.
The authors propose a novel MEA approach based on knowledge distillation (KD) that leverages CFs to efficiently extract a substitute model.
To mitigate the risks associated with providing CFs, the authors introduce an approach to incorporate differential privacy (DP) into the CF generation process.
Experimental results show that the proposed KD-based MEA approach outperforms baseline methods, and the integration of DP into CF generation can effectively mitigate MEA while preserving the quality of explanations to an acceptable extent.
The analysis examines the impact of DP on the quality of CFs, including metrics such as prediction gain, actionability, and realism.
The work highlights the importance of balancing the need for transparency through XAI techniques and the necessity of preserving the privacy of training data in MLaaS scenarios.
Thống kê
The target model is a deep neural network with 16 hidden layers.
The threat model is a deep neural network with 3 hidden layers.
The authors use 3 real-world datasets: Give Me Some Credit, Credit Card Fraud, and California Housing.
Trích dẫn
"Counterfactual explanations can unveil insights about the inner workings of the model which could be exploited by malicious users."
"The inclusion of a privacy layer impacts the performance of the explainer, the quality of CFs, and results in a reduction in the MEA performance."