The paper proposes a novel solution called Adaptive Style Incorporation (ASI) to address the text-driven style transfer task in the context of text-to-image (T2I) diffusion models. The key challenge is to consistently preserve the structure of the source image while enabling effective style transfer effects.
The authors first present the Siamese Cross-Attention (SiCA) module, which decouples the single-track cross-attention into a dual-track architecture to obtain separate content and style features. They then introduce the Adaptive Content-Style Blending (AdaBlending) module to implement mask-guided fine-grained style incorporation from a structure-consistent perspective.
The proposed ASI mechanism allows for feature-level fine-grained style incorporation, which helps maintain better structure consistency compared to previous prompt-level coarse-grained style injection approaches. Experimental results demonstrate that the ASI method exhibits much better performance in both structure preservation and stylized effects when applied to real-world images, visual enhancement tasks, and style transfer for generated images.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Yanqi Ge,Jia... lúc arxiv.org 04-11-2024
https://arxiv.org/pdf/2404.06835.pdfYêu cầu sâu hơn