The paper introduces the DG-MORL algorithm, which addresses the challenges in multi-objective reinforcement learning (MORL) by leveraging prior demonstrations as guidance. The key highlights are:
Demonstration-Preference Alignment: The algorithm computes corner weights to align the demonstrations with user preferences, addressing the demonstration-preference misalignment issue.
Self-Evolving Mechanism: A self-evolving mechanism is introduced to continuously update and improve the guiding demonstrations, mitigating the impact of sub-optimal initial demonstrations and addressing the demonstration deadlock problem.
Multi-Stage Curriculum: A multi-stage curriculum is implemented to facilitate a smooth transition from the guide policy to the exploration policy, addressing the policy shift challenge.
Empirical Evaluation: Extensive experiments on benchmark MORL environments demonstrate the superiority of DG-MORL over state-of-the-art MORL algorithms in terms of learning efficiency, final performance, and robustness.
Theoretical Analysis: The paper provides theoretical insights into the lower and upper bounds of the sample efficiency of the DG-MORL algorithm.
Overall, the DG-MORL algorithm effectively leverages prior demonstrations to enhance exploration and policy learning in complex MORL tasks, outperforming existing methods.
إلى لغة أخرى
من محتوى المصدر
arxiv.org
الرؤى الأساسية المستخلصة من
by Junlin Lu,Pa... في arxiv.org 04-08-2024
https://arxiv.org/pdf/2404.03997.pdfاستفسارات أعمق