The author argues for the importance of grounding large language models (LLMs) with evident preferences to achieve controllable preference optimization (CPO). By explicitly specifying preference scores for different objectives, CPO guides the model to generate responses that align with various preferences.
Grounding LLMs with evident preferences through controllable preference optimization can achieve multi-objective alignment.
우리는 목표를 달성하기 위해 명확한 조건부 학습이 필요하다.