Keskeiset käsitteet
A novel diffusion-based approach for texture editing that leverages CLIP image embeddings to define intuitive editing directions from natural language prompts, while preserving the identity of the input texture.
Tiivistelmä
The authors propose TexSliders, a diffusion-based texture editing method that allows users to define editing directions using simple text prompts, such as "aged wood" to "new wood". The key idea is to leverage CLIP image embeddings to find a suitable direction in the embedding space that captures the desired edit, while preserving the identity of the input texture.
The method works as follows:
The authors use a diffusion prior model to convert the text prompts into CLIP image embeddings, which are then used to condition a pre-trained diffusion model for texture generation.
To define an editing direction, the authors compute multiple CLIP image embeddings for the original and target prompts, and take the difference between their centroids as the initial direction.
To improve identity preservation, the authors select a subset of dimensions from the initial direction based on the relative intra- and inter-cluster variability. This helps to focus the edit on the desired attribute while minimizing changes to the texture identity.
The final editing direction is then used to condition the diffusion model and generate the edited texture, allowing the user to control the intensity of the edit by adjusting the step size along the direction.
The authors evaluate their method on a diverse set of generated and real-world textures, demonstrating its ability to perform a wide range of edits, such as weathering, scale, and roughness, while preserving the identity of the input texture. They also compare their approach to state-of-the-art diffusion-based image editing methods, showing superior performance in terms of both edit adherence and identity preservation.
Tilastot
"Diffusion models have enabled intuitive image creation and manipulation using natural language."
"Textures are ubiquitous in image manipulation, graphic design, illustrations, rendering, and 3D modeling."
"Recent work has proposed to leverage attention maps for general diffusion-based image editing, but these are not as informative in the context of texture."
Lainaukset
"We explore a different solution: texture manipulations in the CLIP embedding space, akin to latent manipulations in GANs."
"Our approach allows to define new sliders for custom concepts with simple text prompts in a matter of minutes."