Centrala begrepp
Enhancing fashion image editing with multimodal prompts.
Sammanfattning
The article introduces a novel approach, Ti-MGD, for fashion image editing guided by text, human poses, garment sketches, and fabric textures. It extends latent diffusion models to incorporate multiple modalities and demonstrates effectiveness through experimental evaluations on newly annotated datasets.
-
Abstract
- Fashion illustration importance.
- Computer vision potential in fashion design.
- Multimodal-conditioned fashion image editing approach.
-
Introduction
- Intersection of computer vision and fashion.
- Lack of attention to text-conditioned fashion image editing.
- Introduction of multimodal-conditioned fashion image editing task.
-
Proposed Method
- Human-centric image editing with multimodal prompts.
- Preliminaries on Stable Diffusion and CLIP.
- Conditioning on pose maps, sketches, and fabric textures.
-
Collecting Multimodal Fashion Datasets
- Annotating Dress Code and VITON-HD datasets with textual descriptions, garment sketches, and fabric textures.
-
Experimental Evaluation
- Implementation details and competitors comparison.
- Evaluation metrics: FID, KID, PD (Pose Distance), SD (Sketch Distance), TS (Texture Similarity).
Statistik
Our proposed method achieves an FID of 3.46 in the paired setting on Dress Code Multimodal dataset.
The Ti-MGD model shows a KID of 0.66 in the unpaired setting on VITON-HD Multimodal dataset.
Citat
"The proposed approach aims to generate human-centric fashion images guided by multimodal prompts."
"Our contributions include proposing a novel task of multimodal-conditioned fashion image editing."