Główne pojęcia
OCE RL optimally solved by reducing to standard RL in AugMDP.
Streszczenie
Risk-Sensitive RL (RSRL) with Optimized Certainty Equivalent (OCE) risk.
Two meta-algorithms proposed: optimistic algorithms and policy optimization.
Framework generalizes prior works in RSRL with CVaR and entropic risk.
Empirical validation with PPO shows optimal CVaR policy learning.
Discrete rewards ensure computational tractability without regret sacrifice.
Statystyki
OCEu(X) := max b∈supp(X){b + E[u(X - b)]}.
OCE RL can be optimally solved by reducing to standard RL in the AugMDP.
Cytaty
"OCE RL is a general framework for RSRL that can capture a wide gamut of risk measures."
"Our optimistic meta-algorithm unifies almost all prior works in risk-sensitive RL."