toplogo
Logg Inn

Deep Compression Autoencoder (DC-AE): A Novel Approach for Accelerating High-Resolution Diffusion Models


Grunnleggende konsepter
This paper introduces DC-AE, a new type of autoencoder that significantly speeds up high-resolution image synthesis in diffusion models by achieving higher spatial compression ratios while maintaining reconstruction accuracy.
Sammendrag
  • Bibliographic Information: Chen, J., Cai, H., Chen, J., Xie, E., Yang, S., Tang, H., ... & Han, S. (2024). Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models. arXiv preprint arXiv:2410.10733v1.
  • Research Objective: This paper aims to address the computational bottleneck of high-resolution image synthesis in diffusion models by developing a novel autoencoder capable of achieving high spatial compression ratios without compromising reconstruction accuracy.
  • Methodology: The researchers propose Deep Compression Autoencoder (DC-AE), which incorporates two key techniques: (1) Residual Autoencoding, enabling the model to learn residuals based on space-to-channel transformed features, and (2) Decoupled High-Resolution Adaptation, a three-phase training strategy mitigating the generalization penalty associated with high spatial compression.
  • Key Findings: DC-AE successfully achieves spatial compression ratios of up to 128 while maintaining high reconstruction quality. When applied to latent diffusion models, DC-AE demonstrates significant speedup in both training and inference without sacrificing image generation quality. For instance, on ImageNet 512 × 512, DC-AE achieves a 19.1x inference speedup and a 17.9x training speedup on H100 GPU for UViT-H while achieving better FID scores compared to the widely used SD-VAE-f8 autoencoder.
  • Main Conclusions: This research presents DC-AE as a viable solution for accelerating high-resolution diffusion models. By effectively compressing the latent space representation, DC-AE enables faster training and inference without compromising the quality of generated images.
  • Significance: This work contributes significantly to the field of computer vision by addressing the computational challenges associated with high-resolution image synthesis in diffusion models. The proposed DC-AE architecture has the potential to facilitate the development of more efficient and scalable diffusion models for various applications.
  • Limitations and Future Research: While DC-AE demonstrates promising results, further exploration is needed to investigate its performance on a wider range of datasets and diffusion model architectures. Additionally, exploring alternative compression techniques and training strategies could further enhance the efficiency and accuracy of DC-AE.
edit_icon

Tilpass sammendrag

edit_icon

Omskriv med AI

edit_icon

Generer sitater

translate_icon

Oversett kilde

visual_icon

Generer tankekart

visit_icon

Besøk kilde

Statistikk
On ImageNet 512 × 512, DC-AE provides 19.1x inference speedup and 17.9x training speedup on H100 GPU for UViT-H while achieving a better FID, compared with the widely used SD-VAE-f8 autoencoder. On ImageNet 256 × 256, the reconstruction FID (rFID) degrades from 0.90 to 28.3 if switching from an 8x spatial compression ratio to a 64x spatial compression ratio using SD-VAE.
Sitater
"This work presents Deep Compression Autoencoder (DC-AE), a new family of high spatial-compression autoencoders for efficient high-resolution image synthesis." "With these techniques, we increase the spatial compression ratio of autoencoders to 32, 64, and 128 while maintaining good reconstruction accuracy."

Viktige innsikter hentet fra

by Junyu Chen, ... klokken arxiv.org 10-15-2024

https://arxiv.org/pdf/2410.10733.pdf
Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models

Dypere Spørsmål

0
star