Mutual Information Regularized Offline Reinforcement Learning Framework
The author proposes the MISA framework for offline RL, leveraging mutual information between states and actions to constrain policy improvement direction within the dataset manifold.