Core Concepts
EpiDiff efficiently generates high-quality multiview images from a single input, surpassing previous methods in quality metrics.
Abstract
EpiDiff introduces a localized interactive multiview diffusion model that leverages epipolar constraints to enhance cross-view interaction among neighboring views. The model can generate 16 multiview images in just 12 seconds, outperforming previous methods in quality evaluation metrics like PSNR, SSIM, and LPIPS. By incorporating a lightweight epipolar attention block into the UNet, EpiDiff enables the generation of more diverse views while maintaining consistency and efficiency. Extensive experiments validate the effectiveness of EpiDiff in generating multiview-consistent and high-quality images.
Stats
EpiDiff generates 16 multiview images in just 12 seconds.
EpiDiff surpasses previous methods in quality evaluation metrics like PSNR, SSIM, and LPIPS.