Bibliographic Information: Miao, Y., Loh, W., Kothawade, S., Poupart, P., Rashwan, A., & Li, Y. (2024). Subject-driven Text-to-Image Generation via Preference-based Reinforcement Learning. Advances in Neural Information Processing Systems, 36.
Research Objective: This paper aims to address the limitations of existing text-to-image generation models in accurately portraying specific subjects from reference images while adhering to textual prompts. The authors propose a novel method, Reward Preference Optimization (RPO), to improve the fidelity of generated images to both reference images and textual descriptions.
Methodology: RPO leverages a novel λ-Harmonic reward function that combines image-to-image and text-to-image alignment scores to guide the training process. This function enables early stopping to prevent overfitting to reference images and accelerates training. The method utilizes the Bradley-Terry preference model to generate preference labels from the reward function, guiding a preference-based reinforcement learning algorithm to fine-tune a pre-trained diffusion model.
Key Findings: RPO demonstrates superior performance compared to existing state-of-the-art methods on the DreamBench dataset, achieving a CLIP-I score of 0.833 and a CLIP-T score of 0.314. The ablation study highlights the importance of both the λ-Harmonic reward function and the preference loss in achieving these results. The λ-Harmonic reward function effectively guides the model towards generating images faithful to both reference images and textual prompts, while the preference loss acts as a regularizer, preventing overfitting to the reference images.
Main Conclusions: RPO presents a more efficient and effective approach for subject-driven text-to-image generation compared to existing methods. The proposed λ-Harmonic reward function and the use of preference-based reinforcement learning contribute significantly to its superior performance in generating high-fidelity images that accurately reflect both the subject and the textual description.
Significance: This research significantly contributes to the field of text-to-image generation by introducing a novel reward function and a more efficient training approach. RPO's ability to generate high-fidelity images faithful to both reference images and textual prompts has significant implications for various applications, including content creation, image editing, and design.
Limitations and Future Research: While RPO demonstrates promising results, the authors acknowledge limitations regarding the sensitivity of the λ-Harmonic reward function to the choice of λ value. Future research could explore methods for automatically determining the optimal λ value or investigate alternative reward functions less sensitive to hyperparameter tuning. Additionally, exploring the application of RPO to other text-to-image generation tasks beyond subject-driven generation could further validate its effectiveness and broader applicability.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Yanting Miao... lúc arxiv.org 11-01-2024
https://arxiv.org/pdf/2407.12164.pdfYêu cầu sâu hơn