Personalized image generation aims to render subjects in novel scenes, styles, and actions.
Diffusion-based methods have advanced personalized image generation.
Existing Methods:
Fine-tuning-based methods require several images of the specified subject for model optimization.
Tuning-free methods train on large-scale datasets and encode any image into embeddings for personalization.
Proposed MM-Diff:
Integrates vision-augmented text embeddings and detail-rich subject embeddings into the diffusion model.
Introduces cross-attention map constraints for multi-subject image generation without predefined inputs.
Experimental Results:
MM-Diff outperforms other leading methods in subject fidelity and text consistency across various test sets.
Özeti Özelleştir
Yapay Zeka ile Yeniden Yaz
Alıntıları Oluştur
Kaynağı Çevir
Başka Bir Dile
Zihin Haritası Oluştur
kaynak içeriğinden
Kaynak
arxiv.org
MM-Diff
İstatistikler
"Personalization is expensive, as these methods typically need 10-30 minutes to fine-tune the model for each new subject using specially crafted data."
"Extensive experiments demonstrate the superior performance of MM-Diff over other leading methods."