Efficient Offline Reinforcement Learning through Grid-Mapping Pseudo-Count Constraint
The authors propose a novel Grid-Mapping Pseudo-Count (GPC) method to accurately quantify uncertainty in continuous offline reinforcement learning, and develop the GPC-SAC algorithm by combining GPC with the Soft Actor-Critic framework to achieve better performance and lower computational cost compared to existing algorithms.