Vision transformers with shifted window attention improve voxel 3D reconstruction accuracy.
Efficiently reconstruct 3D assets from single images using Gamba, leveraging Gaussian splatting and Mamba for speed and quality.
Vision transformers with shifted window attention improve voxel 3D reconstruction accuracy.
VistaDream reconstructs high-quality, consistent 3D scenes from single-view images by leveraging a novel two-stage pipeline that combines the strengths of diffusion models and vision-language models, outperforming existing methods without requiring fine-tuning.