Uni-O4 proposes a seamless transition between offline and online learning, enhancing performance and efficiency in deep reinforcement learning.
Proposing Uni-O4 for seamless offline and online learning with on-policy optimization.