Accelerating Policy Optimization through Extremum-Seeking Action Selection
Extremum-Seeking Action Selection (ESA) improves the quality of exploratory actions in policy optimization, reducing the sampling of low-value trajectories and accelerating learning.