Kernkonzepte
A novel pipeline for generating strictly-ID-preserved and controllable advertising images for accessories, focusing on earrings as an example.
Zusammenfassung
The content presents a novel pipeline for generating strictly-ID-preserved and controllable advertising images for accessories, using earrings as an example. The key highlights are:
The pipeline is based on the Control-Net architecture, which uses the image of the earring as the conditioning image to ensure strict-ID-preservation of the accessory.
A multi-branch cross-attention architecture is proposed to enable fine-grained control over the scale, pose, and appearance of the generated model face, going beyond the limitations of text prompts.
To balance the influence of the different control branches, the authors introduce a standard-deviation based normalization (STD-Norm) mechanism and a time-dependent weighting (TDW) strategy.
Extensive experiments on earring-model image generation demonstrate the superiority of the proposed method in terms of strict-ID-preservation and diverse controllability, compared to existing customized generative models and in-painting approaches.
Statistiken
The authors collected a dataset of 230K earring-model images from the internet, with captions extracted using BLIP and earrings segmented using Segment-Anything. They also collected an additional 3000 images for evaluation.