Mildly Conservative Model-Based Offline Reinforcement Learning Algorithm Outperforms Prior Methods on Benchmark Tasks
The proposed DOMAIN algorithm incorporates an adaptive sampling distribution of model data to achieve mildly conservative value estimation, outperforming prior model-based offline RL methods on benchmark tasks.