A dynamic weight-based preference inference (DWPI) algorithm that can accurately infer user preferences from demonstrations, including sub-optimal ones, in multi-objective reinforcement learning settings.
This paper proposes a novel demonstration-guided multi-objective reinforcement learning (DG-MORL) algorithm that utilizes prior demonstrations to enhance exploration efficiency and policy performance in complex multi-objective reinforcement learning tasks.