Core Concepts
GeoDiffuser is a unified method that enables common 2D and 3D image editing operations, such as object translation, 3D rotation, and object removal, while preserving object style and inpainting disoccluded regions.
Abstract
The paper introduces GeoDiffuser, a zero-shot optimization-based method that unifies common 2D and 3D image editing capabilities into a single approach. The key insight is to view image editing operations as geometric transformations and incorporate them directly within the attention layers of a pre-trained diffusion model.
The main highlights are:
GeoDiffuser can perform a variety of 2D and 3D editing operations, including object translation, 3D rotation, and object removal, without the need for additional training.
The method leverages the attention mechanism in diffusion models to apply the desired geometric transformation to the image, while preserving the object's style and inpainting disoccluded regions.
The optimization process is guided by a set of loss functions that penalize the deviation of the edit attention from the reference attention, preserve the background, and handle disocclusions.
Extensive qualitative and quantitative evaluations demonstrate that GeoDiffuser outperforms existing methods in terms of edit adherence, style preservation, and perceptual quality.
Stats
"We show extensive qualitative results that demonstrate that our method can perform multiple 2D and 3D editing operations using a single approach."
"Results show that our method outperforms existing methods quantitatively while being general enough to perform various kinds of edits."
Quotes
"Our key insight is to view image editing operations as geometric transformations."
"We show that these transformations can be directly incorporated into the attention layers in diffusion models to implicitly perform editing operations."
"GeoDiffuser is a zero-shot optimization-based method that operates without the need for any additional training and can use any diffusion model with attention."