מושגי ליבה
Incorporating human feedback improves instructional image editing models significantly.
תקציר
This paper introduces HIVE, a framework that leverages human feedback to enhance instructional visual editing. The framework collects human feedback on edited images to capture user preferences and uses scalable diffusion model fine-tuning methods to incorporate this feedback. Extensive experiments show that HIVE outperforms previous state-of-the-art models by a large margin. The paper also discusses the challenges, methodology, experiments, ablation studies, and limitations of the approach.
Structure:
- Introduction
- Abstract
- Related Work
- Methodology
- Experiments
- Baseline Comparisons
- Ablation Study
- Conclusion and Discussion
סטטיסטיקה
We present a new 1.1M training dataset, a 3.6K reward dataset, and a 1K evaluation dataset.
HIVE is favored over previous state-of-the-art instructional image editing approaches by a large margin.
The reward model Rϕ(˜x, c) reflects human preferences for edited images.
ציטוטים
"Incorporating human feedback has been shown to be crucial to align text generated by large language models to human preferences."
"Our main contributions are summarized as follows: To tackle the technical challenge of fine-tuning diffusion models using human feedback, we introduce two scalable fine-tuning approaches."