The paper introduces Reward Guided Latent Consistency Distillation (RG-LCD) to improve image synthesis by aligning a Latent Consistency Model (LCM) with human preferences. By integrating feedback from a reward model, RG-LCD accelerates inference speed without compromising sample quality. The proposed method overcomes issues of reward over-optimization by introducing a latent proxy RM (LRM). Empirical results show that RG-LCD outperforms baseline methods in terms of sample quality and inference speed. Human evaluation and automatic metrics demonstrate the effectiveness of RG-LCD in generating high-quality images aligned with human preferences.
To Another Language
from source content
arxiv.org
Viktige innsikter hentet fra
by Jiachen Li,W... klokken arxiv.org 03-19-2024
https://arxiv.org/pdf/2403.11027.pdfDypere Spørsmål