Leveraging Unlabeled Data to Improve Offline Reinforcement Learning Across Two Domains
A novel offline reinforcement learning problem setting, Positive-Unlabeled Offline RL (PUORL), is introduced to effectively utilize domain-unlabeled data in scenarios with two distinct domains. An algorithmic framework is proposed that leverages positive-unlabeled learning to predict domain labels and integrate the domain-unlabeled data into policy training.