The paper introduces Reward Guided Latent Consistency Distillation (RG-LCD) to improve image synthesis by aligning a Latent Consistency Model (LCM) with human preferences. By integrating feedback from a reward model, RG-LCD accelerates inference speed without compromising sample quality. The proposed method overcomes issues of reward over-optimization by introducing a latent proxy RM (LRM). Empirical results show that RG-LCD outperforms baseline methods in terms of sample quality and inference speed. Human evaluation and automatic metrics demonstrate the effectiveness of RG-LCD in generating high-quality images aligned with human preferences.
לשפה אחרת
מתוכן המקור
arxiv.org
תובנות מפתח מזוקקות מ:
by Jiachen Li,W... ב- arxiv.org 03-19-2024
https://arxiv.org/pdf/2403.11027.pdfשאלות מעמיקות