The content discusses the benefits of dynamics-awareness in preference-based reinforcement learning. It introduces the concept of Preference-based RL (PbRL), explores dynamics-aware reward functions, and presents experimental results demonstrating the effectiveness of these methods across various tasks and feedback amounts.
The authors highlight the challenges of specifying reliable numerical reward functions in traditional reinforcement learning and introduce PbRL as a solution that infers reward values from preference feedback. They propose using dynamics-aware reward functions to improve sample efficiency in PbRL by incorporating environment dynamics into the learning process.
Through experiments on locomotion tasks and object manipulation tasks, they show that REED (Rewards Encoding Environment Dynamics) outperforms existing methods like SURF, RUNE, and MRN in terms of policy performance. The results indicate that REED methods retain policy performance with significantly fewer pieces of feedback compared to baseline approaches.
The study also compares different labelling strategies for preference feedback and analyzes the impact of image-space observations on policy performance. The authors conclude that dynamics awareness is crucial for improving sample efficiency in preference-based reinforcement learning.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Katherine Me... lúc arxiv.org 02-29-2024
https://arxiv.org/pdf/2402.17975.pdfYêu cầu sâu hơn