A novel offline reinforcement learning problem setting, Positive-Unlabeled Offline RL (PUORL), is introduced to effectively utilize domain-unlabeled data in scenarios with two distinct domains. An algorithmic framework is proposed that leverages positive-unlabeled learning to predict domain labels and integrate the domain-unlabeled data into policy training.


coremsg

leveraging-unlabeled-data-to-improve-offline-reinforcement-learning-across-two-domains


Leveraging Unlabeled Data to Improve Offline Reinforcement Learning Across Two Domains