This paper introduces a novel framework that combines multi-objective reinforcement learning (MORL) with demonstration-based learning to enable robots to adapt their navigation behavior to changing user preferences without retraining.
The key highlights are:
The framework includes three tuneable navigation objectives: human distance keeping, navigational efficiency, and demonstration-like behavior. These objectives can be dynamically weighted to reflect the user's preferences.
The demonstration-based behavior is integrated into the MORL reward model using the Disturbance-based Reward Extrapolation (D-REX) approach, which allows the agent to learn from a single demonstration trajectory.
Extensive evaluations, including qualitative analysis of the navigation behavior and quantitative metrics, demonstrate the agent's ability to adapt its navigation style according to the specified preferences.
The framework is successfully transferred to real-world robot platforms, showing both sim-to-real and robot-to-robot transfer capabilities.
The proposed approach addresses the limitations of traditional reinforcement learning methods, which often fail to adapt to changing user preferences without retraining. By integrating demonstration data as a tuneable objective, the robot can learn nuanced navigation styles that are difficult to express using an analytical reward function.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Jorge de Heu... at arxiv.org 04-09-2024
https://arxiv.org/pdf/2404.04857.pdfDeeper Inquiries