RLCD proposes a method for aligning language models using reinforcement learning and contrastive distillation to improve model behavior without human feedback.
RLCD proposes a method using reinforcement learning to align language models by generating preference pairs with contrasting prompts.