Venkataraman, S., Wang, Y., Wang, Z., Erickson, Z., & Held, D. (2024). Real-World Offline Reinforcement Learning from Vision Language Model Feedback. arXiv preprint arXiv:2411.05273.
This research aims to address the challenge of reward labeling in offline reinforcement learning (RL) for complex, real-world robotics tasks by introducing a system that automatically generates reward labels from unlabeled datasets using vision-language models (VLMs).
The researchers developed Offline RL-VLM-F, a two-phase system. In the reward labeling phase, the system samples image observation pairs from an unlabeled dataset and queries a VLM for preferences based on a text description of the task. These preferences are then used to train a reward model. In the policy learning phase, the learned reward model labels the entire dataset, which is then used to train a policy using Implicit Q Learning (IQL).
The study demonstrates the effectiveness of using VLMs for automatic reward labeling in offline RL, enabling the learning of complex manipulation tasks in both simulation and real-world settings, even with sub-optimal datasets. This approach eliminates the need for manually labeled rewards, which are often difficult and time-consuming to obtain for complex tasks.
This research significantly contributes to the field of robotics by presenting a practical and effective method for learning robot control policies from readily available, unlabeled datasets, potentially accelerating the development and deployment of robots in real-world applications.
The study primarily focuses on single-task learning. Future research could explore extending Offline RL-VLM-F to multi-task learning scenarios and investigate its performance with different VLMs and offline RL algorithms. Additionally, exploring methods to improve the sample efficiency of the reward learning phase would be beneficial.
A otro idioma
del contenido fuente
arxiv.org
Ideas clave extraídas de
by Sreyas Venka... a las arxiv.org 11-11-2024
https://arxiv.org/pdf/2411.05273.pdfConsultas más profundas