The content discusses the problem of Continual Offline Reinforcement Learning (CORL), which aims to enable agents to learn multiple tasks from static offline datasets and adapt to new tasks. Existing methods based on Actor-Critic (AC) structures face challenges such as distribution shifts, low efficiency, and limited knowledge-sharing.
The authors propose that Decision Transformer (DT), another offline RL paradigm, can serve as a more suitable offline continuous learner. DT offers advantages in learning efficiency, distribution shift mitigation, and zero-shot generalization, but also exacerbates the forgetting problem during supervised parameter updates.
To address this, the authors introduce two new DT-based methods:
Multi-Head DT (MH-DT): Uses multiple heads to store task-specific knowledge and share knowledge with common components. Employs distillation and selective rehearsal to enhance current task learning.
Low-Rank Adaptation DT (LoRA-DT): Merges less influential weights and fine-tunes the MLP layer in DT blocks using LoRA to adapt to the current task, without requiring a replay buffer.
Experiments on MuJoCo and Meta-World benchmarks demonstrate that the proposed methods outperform SOTA CORL baselines, showcase enhanced learning capabilities, and are more memory-efficient.
他の言語に翻訳
原文コンテンツから
arxiv.org
深掘り質問