The paper presents a game-theoretic framework to model the strategic interactions between an adversary and a machine learning (ML) system in the context of explanation-based membership inference attacks (MIAs). The key highlights are:
The authors model the interactions as a continuous-time stochastic signaling game, where the variance of the generated explanations (by the ML system) evolve according to a Geometric Brownian Motion (GBM) process.
They characterize the Markov Perfect Equilibrium (MPE) of the game as a pair of two optimal functions U(π) and L(π), where U(π) represents the optimal variance path for the explanations generated by the system, and L(π) represents the optimal variance path for the explanations given by the system to an adversary after adding some noise.
The authors evaluate the game for different gradient-based explanation methods (Integrated Gradients, Gradient*Input, LRP, Guided Backpropagation) and popular datasets (Purchase, Texas, CIFAR-10, CIFAR-100, Adult census). They demonstrate that the capability of an adversary to launch MIA depends on factors such as the chosen explanation method, input dimensionality, model size, and the number of training rounds.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Kavita Kumar... lúc arxiv.org 04-11-2024
https://arxiv.org/pdf/2404.07139.pdfYêu cầu sâu hơn