Overestimation, Overfitting, and Plasticity in Actor-Critic: The Bitter Lesson of Reinforcement Learning
The author explores the effectiveness of various regularization techniques in off-policy RL, highlighting the superiority of network regularization methods over domain-specific approaches. The study emphasizes the importance of diverse benchmarking for a deeper understanding of regularization techniques.