Offline Reinforcement Learning

로그인

통찰 - Offline Reinforcement Learning

Classification in Offline Reinforcement Learning: A Plug-and-Play Replacement for Value Function Estimation? An Empirical Study

Replacing mean squared error regression with cross-entropy classification for training value functions in offline reinforcement learning can lead to performance improvements in certain algorithms and tasks, particularly those relying heavily on policy regularization, but may not be a universally applicable "plug-and-play" solution.

Offline Reinforcement Learning with Value Decomposition in Factorisable Action Spaces: An Empirical Investigation

Factoring action spaces and employing value decomposition, as exemplified by DecQN, significantly improves the efficiency and performance of offline reinforcement learning in complex environments, particularly when dealing with limited or suboptimal data.

Improving Decision Transformer Performance by Generating Trajectory Branches with a Diffusion Model

Decision Transformer (DT), a sequence modeling approach to offline reinforcement learning, often converges to sub-optimal trajectories. This paper proposes a novel method, Diffusion-Based Trajectory Branch Generation (BG), to enhance DT's performance by expanding the dataset with generated trajectory branches leading to higher returns, thus enabling DT to learn better policies.

Doubly Mild Generalization: A Novel Approach to Offline Reinforcement Learning

Mild generalization, both in action selection and value propagation, can be effectively leveraged to improve the performance of offline reinforcement learning algorithms.

Offline Reinforcement Learning from Vision-Language Model Feedback for Real-World Robot-Assisted Dressing and Simulated Tasks

This paper introduces Offline RL-VLM-F, a novel system that leverages vision-language models (VLMs) to automatically generate reward labels for unlabeled datasets, enabling offline reinforcement learning for complex real-world robotics tasks, such as robot-assisted dressing, and outperforming existing baselines in various simulated manipulation tasks.

Constrained Latent Action Policies (C-LAP): A Novel Approach to Model-Based Offline Reinforcement Learning for Improved Performance and Reduced Value Overestimation

C-LAP, a novel model-based offline reinforcement learning method, leverages a generative model of joint state-action distribution and a constrained policy optimization approach to enhance performance and mitigate value overestimation, particularly excelling in scenarios with visual observations.

Hypercube Policy Regularization: Enhancing Offline Reinforcement Learning by Exploring Similar State Actions

The hypercube policy regularization framework improves offline reinforcement learning by allowing agents to explore actions corresponding to similar states within a hypercube, striking a balance between conservatism and aggressiveness for better policy learning.

Domain Adaptation for Offline Reinforcement Learning with Limited Samples: A Theoretical and Empirical Exploration

This paper proposes a novel framework for domain adaptation in offline reinforcement learning (RL) with limited target samples, theoretically analyzing the trade-off between leveraging a large, related source dataset and a limited target dataset, and providing empirical validation on the Procgen benchmark.

Diffusion Trusted Q-Learning: A Dual Policy Approach for Efficient and Effective Offline Reinforcement Learning

Diffusion Trusted Q-Learning (DTQL) leverages the expressiveness of diffusion policies for behavior cloning while employing a novel diffusion trust region loss to guide a computationally efficient one-step policy for superior performance in offline reinforcement learning tasks.

Diffusion-DICE: Using Diffusion Models to Improve Offline Reinforcement Learning by Transforming Behavior Policy to Optimal Policy

Diffusion-DICE is a novel offline reinforcement learning algorithm that leverages diffusion models to transform the behavior policy distribution into an optimal policy distribution, achieving state-of-the-art performance by minimizing error exploitation in value function approximation.

소개

제품

리소스