แนวคิดหลัก
This paper introduces AnyDesign, a novel mask-free diffusion-based model for realistic and versatile fashion image editing, addressing limitations of previous methods by handling diverse apparel types and complex backgrounds.
สถิติ
The extended dataset (SHHQe) contains 114,077 training and 12,653 testing samples, encompassing nine apparel categories.
The model employs a downsampling factor of 8 in the autoencoder.
The denoising transformer consists of 28 DiT blocks with a channel size of 1,152, a patch size of 2, and 16 heads in cross-attention layers.
Training utilizes the Adam optimizer with a learning rate of 1e-4 for 1,000 time steps.
Inference employs the SA-solver with a classifier-free guidance scale of 4.5.