The author explores the challenges of high update ratios in deep reinforcement learning, focusing on value overestimation and divergence. By addressing these issues through unit-ball normalization, the study challenges the notion that early data overfitting is the primary cause of learning failure.
Trotz hoher Update-Verhältnisse kann Deep Reinforcement Learning ohne das Zurücksetzen von Netzwerkparametern effektiv sein, wenn die Q-Werte korrekt behandelt werden.