Generating Synthetic On-Policy Trajectories for Offline Reinforcement Learning via Policy-Guided Diffusion
Policy-guided diffusion generates synthetic trajectories that balance action likelihoods under both the target and behavior policies, leading to plausible trajectories with high target policy probability while retaining low dynamics error.