Efficient Offline Reinforcement Learning with Behavioral Supervisor Tuning
An algorithm that trains an uncertainty model and uses it to guide the policy to select actions within the dataset support, enabling more effective policy learning from offline datasets compared to previous methods without requiring per-dataset tuning.