Core Concepts
This work introduces a fully unsupervised approach for representing textures using a compositional neural model that captures individual textons as a composition of 2D Gaussian functions with associated appearance features. This representation enables a wide range of texture editing applications.
Abstract
The authors present a neural representation that decomposes textures into geometric elements (textons) and appearance features in an unsupervised manner. Each texton is represented as a 2D Gaussian function with an associated appearance feature vector. This compositional representation offers both expressiveness and ease of editing, as textures can be edited by modifying the Gaussians within the latent space.
The key components of the approach are:
Encoding textures as a composition of Gaussians: The encoder maps an input texture into a structured latent representation consisting of a set of Gaussians, where each Gaussian represents a texton. The encoder architecture includes an appearance encoder, a segmentation network, and a Gaussian parameter estimation layer.
Gaussian splatting: The Gaussian splatting layer converts the latent Gaussians into a feature grid, which is then fed into a generator network to reconstruct the original texture.
Unsupervised compositional texton learning: The authors introduce three key components to learn the compositional neural textons in an unsupervised manner:
Appearance reshuffling: Randomly reshuffling the appearance features of the Gaussians, while preserving the overall texture appearance, to enforce the reshuffling invariance property.
Disentangling structure and appearance: Applying a consistency loss to ensure that the encoding of image transformations is consistent with the transformations applied directly to the Gaussians, promoting the disentanglement of structure and appearance.
Enforcing desired texton properties: Applying losses to encourage discrete objectness, spatial compactness, reshuffling invariance, and transformation equivariance of the learned textons.
The proposed compositional neural texture representation enables a wide range of texture editing applications, including texture transfer, image stylization, texture interpolation and morphing, revealing/modifying texture variations, edit propagation, and direct manipulation of textons.
Stats
"Textures can be edited by modifying the compositional Gaussians within the latent space, and new textures can be efficiently synthesized by feeding the modified Gaussians through a generator network in a feed-forward manner."
"Our representation is human-intelligible, applicable to a wide range of textures that exhibit sufficient self-repetitions, and enhancing a variety of editing techniques."
Quotes
"Textures can be edited by modifying the compositional Gaussians within the latent space, and new textures can be efficiently synthesized by feeding the modified Gaussians through a generator network in a feed-forward manner."
"Our representation is human-intelligible, applicable to a wide range of textures that exhibit sufficient self-repetitions, and enhancing a variety of editing techniques."