The paper introduces the ShoeModel system, which aims to generate hyper-realistic advertising images of user-specified shoes worn by human models. The system consists of three key modules:
Wearable-area Detection (WD) Module: This module detects the visible and wearable areas of the input shoe image, allowing the system to avoid occlusion issues when generating the final image.
Leg-pose Synthesis (LpS) Module: This module generates diverse and plausible leg poses that align with the given shoe image, providing reasonable pose constraints for the subsequent human body generation.
Shoe-wearing (SW) Module: This module combines the processed shoe image and the synthesized leg pose to generate the final hyper-realistic advertising image, while ensuring the identity of the input shoes is maintained.
The authors also introduce a custom shoe-wearing dataset to support the training of the proposed system. Extensive experiments demonstrate the effectiveness of ShoeModel in generating high-quality, realistic images that preserve the identity of the user-specified shoes and exhibit reasonable interactions between the shoes and the human models, outperforming various baseline methods.
Egy másik nyelvre
a forrásanyagból
arxiv.org
Mélyebb kérdések