Kernkonzepte
Resolving pose bias and identity loss in zero-shot customization through harmonizing visual and textual embeddings.
Zusammenfassung
The content discusses the challenges faced in zero-shot text-to-image customization due to conflicts between visual and textual embeddings. It proposes solutions such as orthogonal visual embedding and self-attention swap to address pose bias and identity loss issues. The method is evaluated through experiments, quantitative analysis, user studies, ablations, and comparisons with existing models.
1. Introduction
- Surge of text-to-image (T2I) models.
- Subject-driven image generation aims.
- Challenges of per-subject optimization.
2. Related Works
- Diffusion models for image synthesis.
- Subject-driven generation methods.
- Compositional generation approaches.
3. Preliminaries
- Text-to-image latent diffusion model (LDM).
- Cross-attention mechanism for text prompts.
4. Methods
4.1 Discord among Contextual Embeddings
- Conflict between visual and textual embeddings.
4.2 Contextual Embedding Orchestration
- Orthogonal visual embedding proposal.
4.3 Self-Attention Swap
- Resolving identity loss with self-attention swap.
5. Experiments
5.1 Qualitative Results
5.2 Quantitative Results
5.3 User Study
6. Conclusion & References
Statistiken
In a surge of text-to-image (T2I) models...
Recent advancements in text-to-image (T2I) generation...
Subject-driven generation aims to generate images...
Zero-shot customization methods have been proposed...