Personalized image generation aims to render subjects in novel scenes, styles, and actions.
Diffusion-based methods have advanced personalized image generation.
Existing Methods:
Fine-tuning-based methods require several images of the specified subject for model optimization.
Tuning-free methods train on large-scale datasets and encode any image into embeddings for personalization.
Proposed MM-Diff:
Integrates vision-augmented text embeddings and detail-rich subject embeddings into the diffusion model.
Introduces cross-attention map constraints for multi-subject image generation without predefined inputs.
Experimental Results:
MM-Diff outperforms other leading methods in subject fidelity and text consistency across various test sets.
Personalizar Resumo
Reescrever com IA
Gerar Citações
Traduzir Fonte
Para outro idioma
Gerar Mapa Mental
do conteúdo fonte
Visitar Fonte
arxiv.org
MM-Diff
Estatísticas
"Personalization is expensive, as these methods typically need 10-30 minutes to fine-tune the model for each new subject using specially crafted data."
"Extensive experiments demonstrate the superior performance of MM-Diff over other leading methods."