Venkataraman, S., Wang, Y., Wang, Z., Erickson, Z., & Held, D. (2024). Real-World Offline Reinforcement Learning from Vision Language Model Feedback. arXiv preprint arXiv:2411.05273.
This research aims to address the challenge of reward labeling in offline reinforcement learning (RL) for complex, real-world robotics tasks by introducing a system that automatically generates reward labels from unlabeled datasets using vision-language models (VLMs).
The researchers developed Offline RL-VLM-F, a two-phase system. In the reward labeling phase, the system samples image observation pairs from an unlabeled dataset and queries a VLM for preferences based on a text description of the task. These preferences are then used to train a reward model. In the policy learning phase, the learned reward model labels the entire dataset, which is then used to train a policy using Implicit Q Learning (IQL).
The study demonstrates the effectiveness of using VLMs for automatic reward labeling in offline RL, enabling the learning of complex manipulation tasks in both simulation and real-world settings, even with sub-optimal datasets. This approach eliminates the need for manually labeled rewards, which are often difficult and time-consuming to obtain for complex tasks.
This research significantly contributes to the field of robotics by presenting a practical and effective method for learning robot control policies from readily available, unlabeled datasets, potentially accelerating the development and deployment of robots in real-world applications.
The study primarily focuses on single-task learning. Future research could explore extending Offline RL-VLM-F to multi-task learning scenarios and investigate its performance with different VLMs and offline RL algorithms. Additionally, exploring methods to improve the sample efficiency of the reward learning phase would be beneficial.
Till ett annat språk
från källinnehåll
arxiv.org
Djupare frågor