The paper introduces a reinforcement learning (RL) framework that learns policies that maximize reward while minimizing the disclosure of sensitive state variables through the agent's actions. The key ideas are:
Formulate the problem as a constrained optimization problem, where the objective is to maximize reward while constraining the mutual information between the agent's actions and the sensitive state variables.
Develop several gradient estimators to efficiently optimize this constrained objective, including a model-based estimator, a model-free upper bound estimator, and a reparameterization-based estimator for differentiable environments.
Demonstrate the effectiveness of the approach on a variety of tasks, including a tabular web connection problem, a 2D continuous control task, and high-dimensional simulated robotics tasks. The learned policies are able to effectively hide the sensitive state variables while maintaining high reward.
Compare the approach to differentially private RL and a previous mutual information regularization method, showing the advantages of the proposed mutual information constraint formulation.
The key insight is that by directly optimizing the mutual information between actions and sensitive state, the agent can learn policies that intelligently plan ahead to reduce information disclosure, going beyond simply adding noise. This allows achieving high reward while satisfying strong privacy guarantees.
Іншою мовою
із вихідного контенту
arxiv.org
Ключові висновки, отримані з
by Chris Cundy,... о arxiv.org 04-17-2024
https://arxiv.org/pdf/2012.15019.pdfГлибші Запити