The paper introduces ByteEdit, an innovative framework for optimizing generative image editing through the incorporation of feedback learning. ByteEdit builds multiple reward models, namely the Aesthetic reward model, Alignment reward model, and Coherent reward model, to achieve exceptional generation effects, improved instruction adherence, and enhanced consistency, respectively.
The key components of ByteEdit are:
Boost (Perceptual Feedback Learning): ByteEdit employs a pioneering approach that introduces human feedback to guide the generative model towards superior generation outcomes. It leverages a carefully designed feedback data collection process and trains reward models to provide comprehensive supervision signals.
Comply (Image-Text Alignment with Coherence): ByteEdit introduces two additional components to assess the alignment between the generated content and the user-specified prompt, as well as ensure coherence with the unmasked region at the pixel level.
Accelerate (Adversarial and Progressive Training): ByteEdit proposes an adversarial training strategy that integrates the coherent reward model as a discriminator, along with a progressive training approach to expedite the sampling process while maintaining excellent performance.
Through extensive user evaluations, ByteEdit is shown to surpass leading generative image editing products, including Adobe, Canva, and MeiTu, in both generation quality and consistency. ByteEdit-Outpainting exhibits a remarkable enhancement of 388% and 135% in quality and consistency, respectively, when compared to the baseline model.
다른 언어로
소스 콘텐츠 기반
arxiv.org
더 깊은 질문