The paper introduces the DG-MORL algorithm, which addresses the challenges in multi-objective reinforcement learning (MORL) by leveraging prior demonstrations as guidance. The key highlights are:
Demonstration-Preference Alignment: The algorithm computes corner weights to align the demonstrations with user preferences, addressing the demonstration-preference misalignment issue.
Self-Evolving Mechanism: A self-evolving mechanism is introduced to continuously update and improve the guiding demonstrations, mitigating the impact of sub-optimal initial demonstrations and addressing the demonstration deadlock problem.
Multi-Stage Curriculum: A multi-stage curriculum is implemented to facilitate a smooth transition from the guide policy to the exploration policy, addressing the policy shift challenge.
Empirical Evaluation: Extensive experiments on benchmark MORL environments demonstrate the superiority of DG-MORL over state-of-the-art MORL algorithms in terms of learning efficiency, final performance, and robustness.
Theoretical Analysis: The paper provides theoretical insights into the lower and upper bounds of the sample efficiency of the DG-MORL algorithm.
Overall, the DG-MORL algorithm effectively leverages prior demonstrations to enhance exploration and policy learning in complex MORL tasks, outperforming existing methods.
A otro idioma
del contenido fuente
arxiv.org
Ideas clave extraídas de
by Junlin Lu,Pa... a las arxiv.org 04-08-2024
https://arxiv.org/pdf/2404.03997.pdfConsultas más profundas