The paper introduces TRAJDELETER, a method for enabling offline reinforcement learning (RL) agents to unlearn specific trajectories from their training dataset. Offline RL trains agents using pre-collected datasets, which is useful when online interactions are impractical or risky. However, there is a growing demand to allow agents to rapidly and completely eliminate the influence of specific trajectories, for reasons such as privacy, security, or copyright.
The key idea of TRAJDELETER is to guide the agent to demonstrate deteriorating performance when it encounters states associated with the unlearning trajectories, while ensuring the agent maintains its original performance level when facing other remaining trajectories. TRAJDELETER consists of two phases:
Forgetting: This phase minimizes the value function Q (which estimates the expected cumulative reward) for the unlearning samples, while simultaneously maximizing Q on the remaining samples. This balances unlearning and preventing performance degradation.
Convergence Training: This phase minimizes the discrepancies in cumulative rewards obtained by following the original and unlearned agents when encountering states in other remaining trajectories. This ensures the convergence of the unlearned agent.
The paper also introduces TRAJAUDITOR, a simple yet efficient method to evaluate whether TRAJDELETER successfully eliminates the specific trajectories of influence from the offline RL agent. TRAJAUDITOR fine-tunes the original agent to generate shadow agents, and uses state perturbations to create diverse auditing bases, significantly reducing the time required compared to training shadow agents from scratch.
Extensive experiments on six offline RL algorithms and three tasks demonstrate that TRAJDELETER requires only about 1.5% of the time needed for retraining from scratch. It effectively unlearns an average of 94.8% of the targeted trajectories while still performing well in actual environment interactions after unlearning, outperforming baseline methods.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Chen Gong,Ke... at arxiv.org 04-22-2024
https://arxiv.org/pdf/2404.12530.pdfDeeper Inquiries