The paper presents a game-theoretic framework to model the strategic interactions between an adversary and a machine learning (ML) system in the context of explanation-based membership inference attacks (MIAs). The key highlights are:
The authors model the interactions as a continuous-time stochastic signaling game, where the variance of the generated explanations (by the ML system) evolve according to a Geometric Brownian Motion (GBM) process.
They characterize the Markov Perfect Equilibrium (MPE) of the game as a pair of two optimal functions U(π) and L(π), where U(π) represents the optimal variance path for the explanations generated by the system, and L(π) represents the optimal variance path for the explanations given by the system to an adversary after adding some noise.
The authors evaluate the game for different gradient-based explanation methods (Integrated Gradients, Gradient*Input, LRP, Guided Backpropagation) and popular datasets (Purchase, Texas, CIFAR-10, CIFAR-100, Adult census). They demonstrate that the capability of an adversary to launch MIA depends on factors such as the chosen explanation method, input dimensionality, model size, and the number of training rounds.
Egy másik nyelvre
a forrásanyagból
arxiv.org
Mélyebb kérdések