Aligning Large Language Models via Joint Preference Optimization
Acquiring preferences jointly over instruction-response pairs can significantly enhance the alignment of large language models by tapping into a broader spectrum of human preference elicitation.