The paper shows that various reinforcement learning (RL) algorithms can be formalized within the framework of categorical cybernetics. The key insights are:
Bellman operators, which are fundamental to both dynamic programming and RL, can be represented as optics - bidirectional processes that map between value functions and their updates.
These Bellman operators are extended to parametrised optics that depend on a sample from the environment, capturing the interaction between the agent and the environment.
A representable contravariant functor is applied to the parametrised Bellman operators, yielding a parametrised function that performs the Bellman iteration.
This parametrised function becomes the backward pass of another parametrised optic that represents the model, which interacts with the environment via an agent.
The authors show that many major RL algorithms, such as dynamic programming, Monte Carlo methods, temporal difference learning, and deep RL, can be seen as different extremal cases of this general setup. They argue that this categorical cybernetics approach provides a natural and fruitful way to think about RL.
toiselle kielelle
lähdeaineistosta
arxiv.org
Syvällisempiä Kysymyksiä