Alapfogalmak
RealCustom disentangles similarity from controllability by narrowing real text words, achieving optimal customization in real-time open-domain scenarios.
Kivonat
RealCustom introduces a novel paradigm in text-to-image customization, addressing the dual-optimum paradox. By progressively narrowing down real text words, it achieves high-quality similarity and controllability simultaneously. The adaptive scoring module and mask guidance strategy enable efficient customization with superior results compared to existing methods.
Existing works in text-to-image customization face challenges of entangled influence scope between similarity and controllability. RealCustom's innovative approach disentangles these aspects, leading to improved image quality and generalization capability. Through a "train-inference" framework, RealCustom achieves unprecedented customization ability in real-time scenarios.
Key metrics used to support RealCustom's argument include CLIP-T scores, ImageReward improvements, and qualitative comparisons with existing paradigms. The adaptive scoring module and mask guidance strategy play crucial roles in achieving superior results.
Statisztikák
Existing works follow the pseudo-word paradigm [1].
DreamBooth uses rare-tokens for better similarity [27].
ELITE introduces a multimodal encoder for subject representation [34].