Wang, Y., & Chen, L. (2024). Inversion Circle Interpolation: Diffusion-based Image Augmentation for Data-scarce Classification. arXiv preprint arXiv:2408.16266v2.
This paper addresses the limitations of existing diffusion-based image augmentation methods in balancing faithfulness and diversity when generating synthetic images for data-scarce image classification tasks. The authors propose a novel method, Diff-II, to improve augmentation quality and downstream classification performance.
Diff-II consists of three main steps: 1) Category Concepts Learning: Learnable token embeddings and low-rank matrices are incorporated into a pre-trained diffusion U-Net to learn accurate concept representations for each image category. 2) Inversion Interpolation: DDIM inversion is applied to each training image conditioned on learned concepts. Random pairs of inversions within the same category undergo circle interpolation to generate new latent representations. 3) Two-stage Denoising: Interpolation results are denoised in two stages using different prompts. The first stage utilizes a prompt containing the learned concept and a randomly sampled suffix summarizing high-frequency context patterns. The second stage refines details using a prompt with only the learned concept.
Diff-II effectively addresses the faithfulness-diversity trade-off in diffusion-based image augmentation. By leveraging inversion circle interpolation and two-stage denoising, it generates high-quality synthetic images that improve the generalization ability of classifiers, particularly in data-scarce scenarios.
This research contributes a novel and effective method for data augmentation in image classification, particularly beneficial for fine-grained datasets and challenging scenarios with limited training data.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Yanghao Wang... lúc arxiv.org 11-22-2024
https://arxiv.org/pdf/2408.16266.pdfYêu cầu sâu hơn