Enhancing Sample-Efficient Exploration in Continuous Control Tasks through Model-Based Intrinsic Motivation and Active Online Planning
The core message of this paper is to introduce an RL algorithm, the ACE planner, that incorporates a predictive model and off-policy learning elements to achieve sample-efficient exploration in continuous control tasks. The ACE planner leverages a novelty-aware terminal value function and an exponentially re-weighted version of the model-based value expansion (MVE) target to boost exploration and accelerate credit assignment.