SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation
المفاهيم الأساسية
SwiftBrush introduces an image-free distillation scheme for one-step text-to-image generation, achieving high-quality results without reliance on training image data.
الملخص
Abstract:
Text-to-image diffusion models face slow iterative sampling processes.
SwiftBrush presents a novel image-free distillation scheme for one-step text-to-image generation.
Introduction:
Diffusion models are gaining attention for generative tasks.
Time-step distillation is effective in reducing sampling steps.
Related Work:
Previous methods focus on improving inference speed of diffusion-based text-to-image generation.
Proposed Method:
SwiftBrush leverages insights from text-to-3D synthesis to accelerate text-to-image generation.
Experiments:
Evaluation metrics include FID and CLIP scores on COCO 2014 dataset and HPSv2 score.
Results:
SwiftBrush outperforms other methods in zero-shot text-to-image benchmarks.
Analysis:
Importance of LoRA teacher and student parameterization in SwiftBrush training demonstrated through ablation study.
Conclusion and Discussion:
SwiftBrush offers efficient and accessible text-to-image generation, with potential for future extensions.