The paper addresses the challenge of personalizing text-to-image synthesis to generate images aligned with diverse artistic styles. Existing personalization methods like DreamBooth struggle to capture the broad and abstract nature of artistic styles, which encompass complex visual elements like lines, shapes, textures, and color relationships.
To address this, the authors propose the StyleForge method, which consists of two key components:
Subdivision of artistic styles: The authors categorize artistic styles into two main components - characters and backgrounds. This division allows for the development of techniques that can learn styles without biased information.
Dual binding strategy: StyleForge utilizes around 15-20 target style reference images, along with auxiliary images, to establish a foundational binding between a unique prompt (e.g., "[V] style") and the general features of the target style. The auxiliary images further enhance the acquisition of diverse attributes inherent to the target style.
The authors also introduce Multi-StyleForge, which divides the components of the target style and maps each to a unique identifier for training, improving the alignment between text and images across various styles.
Extensive experiments are conducted on six distinct artistic styles, demonstrating substantial improvements in both the quality of generated images and perceptual fidelity metrics like FID, KID, and CLIP scores, compared to existing personalization methods.
翻譯成其他語言
從原文內容
arxiv.org
深入探究