We propose a novel model-based reinforcement learning algorithm, Dynamics Learning and predictive control with Parameterized Actions (DLPA), that achieves superior sample efficiency and asymptotic performance compared to state-of-the-art PAMDP methods.
Several major reinforcement learning algorithms can be represented within the framework of categorical cybernetics, which models them as parametrised bidirectional processes.