Replacing mean squared error regression with cross-entropy classification for training value functions in offline reinforcement learning can lead to performance improvements in certain algorithms and tasks, particularly those relying heavily on policy regularization, but may not be a universally applicable "plug-and-play" solution.
Factoring action spaces and employing value decomposition, as exemplified by DecQN, significantly improves the efficiency and performance of offline reinforcement learning in complex environments, particularly when dealing with limited or suboptimal data.
Decision Transformer (DT), a sequence modeling approach to offline reinforcement learning, often converges to sub-optimal trajectories. This paper proposes a novel method, Diffusion-Based Trajectory Branch Generation (BG), to enhance DT's performance by expanding the dataset with generated trajectory branches leading to higher returns, thus enabling DT to learn better policies.
Mild generalization, both in action selection and value propagation, can be effectively leveraged to improve the performance of offline reinforcement learning algorithms.
This paper introduces Offline RL-VLM-F, a novel system that leverages vision-language models (VLMs) to automatically generate reward labels for unlabeled datasets, enabling offline reinforcement learning for complex real-world robotics tasks, such as robot-assisted dressing, and outperforming existing baselines in various simulated manipulation tasks.
C-LAP, a novel model-based offline reinforcement learning method, leverages a generative model of joint state-action distribution and a constrained policy optimization approach to enhance performance and mitigate value overestimation, particularly excelling in scenarios with visual observations.
The hypercube policy regularization framework improves offline reinforcement learning by allowing agents to explore actions corresponding to similar states within a hypercube, striking a balance between conservatism and aggressiveness for better policy learning.
This paper proposes a novel framework for domain adaptation in offline reinforcement learning (RL) with limited target samples, theoretically analyzing the trade-off between leveraging a large, related source dataset and a limited target dataset, and providing empirical validation on the Procgen benchmark.
Diffusion Trusted Q-Learning (DTQL) leverages the expressiveness of diffusion policies for behavior cloning while employing a novel diffusion trust region loss to guide a computationally efficient one-step policy for superior performance in offline reinforcement learning tasks.
Diffusion-DICE is a novel offline reinforcement learning algorithm that leverages diffusion models to transform the behavior policy distribution into an optimal policy distribution, achieving state-of-the-art performance by minimizing error exploitation in value function approximation.