Belief-Enriched Pessimistic Q-Learning for Robust Reinforcement Learning Against Adversarial State Perturbations
In this work, the authors propose a new robust RL algorithm that combines belief state inference and diffusion-based state purification to enhance performance under strong attacks. The core reasoning is to derive a pessimistic policy safeguarding against an agent's uncertainty about true states.