Enhancing Conversational Large Language Models with Direct RLHF
The author proposes a novel approach, Mistral-Plus, that bypasses Supervised Fine-Tuning (SFT) in favor of Direct Harmless Reinforcement Learning from Human Feedback (RLHF). This method preserves the base model's general capabilities while enhancing conversational abilities and reducing toxic outputs.