RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences
RIME introduces a robust algorithm for PbRL, focusing on effective reward learning from noisy preferences. The approach incorporates a denoising discriminator and warm start method to enhance robustness and feedback efficiency.