Wang, Y., & Chen, L. (2024). Inversion Circle Interpolation: Diffusion-based Image Augmentation for Data-scarce Classification. arXiv preprint arXiv:2408.16266v2.
This paper addresses the limitations of existing diffusion-based image augmentation methods in balancing faithfulness and diversity when generating synthetic images for data-scarce image classification tasks. The authors propose a novel method, Diff-II, to improve augmentation quality and downstream classification performance.
Diff-II consists of three main steps: 1) Category Concepts Learning: Learnable token embeddings and low-rank matrices are incorporated into a pre-trained diffusion U-Net to learn accurate concept representations for each image category. 2) Inversion Interpolation: DDIM inversion is applied to each training image conditioned on learned concepts. Random pairs of inversions within the same category undergo circle interpolation to generate new latent representations. 3) Two-stage Denoising: Interpolation results are denoised in two stages using different prompts. The first stage utilizes a prompt containing the learned concept and a randomly sampled suffix summarizing high-frequency context patterns. The second stage refines details using a prompt with only the learned concept.
Diff-II effectively addresses the faithfulness-diversity trade-off in diffusion-based image augmentation. By leveraging inversion circle interpolation and two-stage denoising, it generates high-quality synthetic images that improve the generalization ability of classifiers, particularly in data-scarce scenarios.
This research contributes a novel and effective method for data augmentation in image classification, particularly beneficial for fine-grained datasets and challenging scenarios with limited training data.
לשפה אחרת
מתוכן המקור
arxiv.org
תובנות מפתח מזוקקות מ:
by Yanghao Wang... ב- arxiv.org 11-22-2024
https://arxiv.org/pdf/2408.16266.pdfשאלות מעמיקות