The paper proposes a novel solution called Adaptive Style Incorporation (ASI) to address the text-driven style transfer task in the context of text-to-image (T2I) diffusion models. The key challenge is to consistently preserve the structure of the source image while enabling effective style transfer effects.
The authors first present the Siamese Cross-Attention (SiCA) module, which decouples the single-track cross-attention into a dual-track architecture to obtain separate content and style features. They then introduce the Adaptive Content-Style Blending (AdaBlending) module to implement mask-guided fine-grained style incorporation from a structure-consistent perspective.
The proposed ASI mechanism allows for feature-level fine-grained style incorporation, which helps maintain better structure consistency compared to previous prompt-level coarse-grained style injection approaches. Experimental results demonstrate that the ASI method exhibits much better performance in both structure preservation and stylized effects when applied to real-world images, visual enhancement tasks, and style transfer for generated images.
לשפה אחרת
מתוכן המקור
arxiv.org
תובנות מפתח מזוקקות מ:
by Yanqi Ge,Jia... ב- arxiv.org 04-11-2024
https://arxiv.org/pdf/2404.06835.pdfשאלות מעמיקות