Grunnleggende konsepter
Incorporating human feedback improves instructional image editing models significantly.
Sammendrag
This paper introduces HIVE, a framework that leverages human feedback to enhance instructional visual editing. The framework collects human feedback on edited images to capture user preferences and uses scalable diffusion model fine-tuning methods to incorporate this feedback. Extensive experiments show that HIVE outperforms previous state-of-the-art models by a large margin. The paper also discusses the challenges, methodology, experiments, ablation studies, and limitations of the approach.
Structure:
- Introduction
- Abstract
- Related Work
- Methodology
- Experiments
- Baseline Comparisons
- Ablation Study
- Conclusion and Discussion
Statistikk
We present a new 1.1M training dataset, a 3.6K reward dataset, and a 1K evaluation dataset.
HIVE is favored over previous state-of-the-art instructional image editing approaches by a large margin.
The reward model Rϕ(˜x, c) reflects human preferences for edited images.
Sitater
"Incorporating human feedback has been shown to be crucial to align text generated by large language models to human preferences."
"Our main contributions are summarized as follows: To tackle the technical challenge of fine-tuning diffusion models using human feedback, we introduce two scalable fine-tuning approaches."