Kernkonzepte
Value functions trained with categorical cross-entropy significantly improve performance and scalability in various domains, showcasing the potential of using classification instead of regression in deep RL.
Statistiken
Value functions are trained using categorical cross-entropy.
HL-Gauss leads to consistently better performance across various domains.
HL-Gauss outperforms MSE in online and offline RL settings.
Zitate
"Observing this discrepancy, in this paper, we investigate whether the scalability of deep RL can also be improved simply by using classification in place of regression for training value functions."
"Through careful analysis, we show that the benefits of categorical cross-entropy primarily stem from its ability to mitigate issues inherent to value-based RL, such as noisy targets and non-stationarity."