VQ-CD는 양자화 공간 정렬(QSA)과 선택적 가중치 활성화(SWA)를 통해 다양한 작업 시퀀스에서 지속적인 오프라인 강화 학습(CORL) 문제를 해결하는 새로운 프레임워크입니다.
VQ-CD leverages vector quantization to align diverse state and action spaces, enabling a diffusion-based model with selective weight activation to effectively learn and retain knowledge across a sequence of offline reinforcement learning tasks, even with varying state and action dimensions.
An agent can continuously learn new skills via a sequence of pre-collected offline datasets by using a novel offline experience replay (OER) method, which addresses the challenges of catastrophic forgetting and distribution shift.
A dual generative replay framework that leverages diffusion models to model states and behaviors with high fidelity, allowing the continual learning policy to inherit the distributional expressivity and mitigate catastrophic forgetting.
Decision Transformer can serve as a more suitable offline continuous learner compared to Actor-Critic based algorithms, but faces challenges with catastrophic forgetting. The proposed methods MH-DT and LoRA-DT address this issue by leveraging the transformer structure to store and transfer knowledge across tasks.