The authors propose a novel lossy image compression codec that uses foundation latent diffusion models as a means to synthesize lost details, particularly at low bitrates. The key components of their approach are:
Unlike previous work, their formulation requires only a fraction of iterative diffusion steps and can be trained on a dataset of fewer than 100k images. The authors also directly optimize a distortion objective between input and reconstructed images, enforcing coherency to the input image while maintaining highly realistic reconstructions due to the diffusion backbone.
The authors extensively evaluate their method against state-of-the-art generative compression methods on several datasets. Their experiments verify that their approach achieves state-of-the-art visual quality as measured in FID, and their reconstructions are subjectively preferred by end users, even when other methods use twice the bitrate.
Para Outro Idioma
do conteúdo original
arxiv.org
Principais Insights Extraídos De
by Lucas Relic,... às arxiv.org 04-15-2024
https://arxiv.org/pdf/2404.08580.pdfPerguntas Mais Profundas