Enhancing Factuality in Large Language Model Alignment Through Factuality-Aware Supervised Fine-Tuning and Direct Preference Optimization
Factuality-aware alignment, comprising factuality-aware supervised fine-tuning (SFT) and direct preference optimization (DPO), can guide large language models to generate more factual responses while maintaining their instruction-following capability.