Latent Distance Guided Alignment Training: Enhancing Large Language Models without Relying on Human Annotations
A novel DPO-based approach, LD-Align, that aligns a fine-tuned large language model with a high-quality supervised fine-tuning dataset without requiring any additional human annotations or relying on a more powerful language model.