المفاهيم الأساسية
DragNoise, a novel interactive point-based image editing method, leverages diffusion semantic propagation to enable efficient and stable editing by manipulating the predicted noise in the reverse diffusion process as semantic editors.
الملخص
The paper presents DragNoise, a novel interactive point-based image editing method that leverages diffusion semantic propagation. The key insights are:
- The bottleneck feature of the U-Net in diffusion models can effectively capture comprehensive noise semantics, which can be optimized at an early denoising timestep to reflect user edits.
- The optimized bottleneck feature can be propagated to subsequent denoising steps, ensuring the integrity of the complete diffusion semantics and avoiding gradient vanishing issues faced by previous methods.
The DragNoise editing process involves two main stages:
Diffusion Semantic Optimization:
- The user provides anchor points and corresponding objective points.
- The bottleneck feature at a selected early timestep (e.g., t=35) is optimized to align the feature around the anchor points with the objective points, producing the manipulation noise.
Diffusion Semantic Propagation:
- The optimized bottleneck feature is copied and substituted in the subsequent denoising steps to emphasize the editing effect.
- After a certain timestep (e.g., t'=10), the propagation is stopped to allow for refinement of the final image.
Extensive experiments demonstrate that DragNoise significantly outperforms existing GAN-based and diffusion-based point-based editing methods in terms of editing accuracy, semantic preservation, and optimization efficiency, reducing the optimization time by over 50% compared to the recent DragDiffusion approach.
الإحصائيات
DragNoise cuts down the optimization time by over 50% compared to DragDiffusion.
Editing an image with a resolution of 512×512 takes approximately 10 seconds for DragNoise, whereas DragDiffusion requires over 22 seconds.