Compositional Conservatism: A Transductive Approach for Improving Offline Reinforcement Learning Performance
Compositional Conservatism with Anchor-seeking (COCOA) is a framework that pursues conservatism in the compositional input space of the policy and Q-function, independently and agnostically to the prevalent behavioral conservatism in offline reinforcement learning.