The paper introduces the ShoeModel system, which aims to generate hyper-realistic advertising images of user-specified shoes worn by human models. The system consists of three key modules:
Wearable-area Detection (WD) Module: This module detects the visible and wearable areas of the input shoe image, allowing the system to avoid occlusion issues when generating the final image.
Leg-pose Synthesis (LpS) Module: This module generates diverse and plausible leg poses that align with the given shoe image, providing reasonable pose constraints for the subsequent human body generation.
Shoe-wearing (SW) Module: This module combines the processed shoe image and the synthesized leg pose to generate the final hyper-realistic advertising image, while ensuring the identity of the input shoes is maintained.
The authors also introduce a custom shoe-wearing dataset to support the training of the proposed system. Extensive experiments demonstrate the effectiveness of ShoeModel in generating high-quality, realistic images that preserve the identity of the user-specified shoes and exhibit reasonable interactions between the shoes and the human models, outperforming various baseline methods.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Binghui Chen... at arxiv.org 04-09-2024
https://arxiv.org/pdf/2404.04833.pdfDeeper Inquiries