Improving Zero-Shot Reinforcement Learning Performance on Low-Quality Datasets
Existing zero-shot reinforcement learning methods suffer performance degradation when trained on small, homogeneous datasets due to out-of-distribution action value overestimation. Introducing conservative regularization can mitigate this issue and improve performance on low-quality datasets without sacrificing performance on high-quality datasets.