Personalized image generation aims to render subjects in novel scenes, styles, and actions.
Diffusion-based methods have advanced personalized image generation.
Existing Methods:
Fine-tuning-based methods require several images of the specified subject for model optimization.
Tuning-free methods train on large-scale datasets and encode any image into embeddings for personalization.
Proposed MM-Diff:
Integrates vision-augmented text embeddings and detail-rich subject embeddings into the diffusion model.
Introduces cross-attention map constraints for multi-subject image generation without predefined inputs.
Experimental Results:
MM-Diff outperforms other leading methods in subject fidelity and text consistency across various test sets.
Kustomisasi Ringkasan
Tulis Ulang dengan AI
Buat Sitasi
Terjemahkan Sumber
Ke Bahasa Lain
Buat Peta Pikiran
dari konten sumber
Kunjungi Sumber
arxiv.org
MM-Diff
Statistik
"Personalization is expensive, as these methods typically need 10-30 minutes to fine-tune the model for each new subject using specially crafted data."
"Extensive experiments demonstrate the superior performance of MM-Diff over other leading methods."