Core Concepts
GRM introduces a novel transformer-based model using 3D Gaussians for efficient reconstruction and generation.
Abstract
The GRM model is introduced as a large-scale reconstructor capable of recovering 3D assets from sparse-view images in around 0.1s. It efficiently incorporates multi-view information to translate input pixels into pixel-aligned Gaussians, enabling scalable and efficient reconstruction. The model showcases superiority over alternatives in terms of quality and efficiency, especially in generative tasks like text-to-3D and image-to-3D. Key components include the transformer architecture, upsampler design, and the use of pixel-aligned Gaussians for representation.
Introduction
High-quality 3D assets are crucial across various domains.
Emerging generative models offer easy creation of diverse 3D assets.
Optimization-based methods are time-consuming.
Feed-forward generative methods show speedups with quality.
GRM - Gaussian Reconstruction Model
Introduces a feed-forward 3D generative model.
Utilizes sparse-view reconstruction with pixel-aligned Gaussians.
Transformer architecture enhances translation to output scene.
Upsampler design improves detail reconstruction.
Related Work
Neural representations for scene rendering have shown promise.
Recent advancements extend techniques to operate with sparse views.
Challenges exist in capturing multiple modes within datasets.
Methodology
GRM uses a transformer-based encoder for input images.
Pixel-aligned Gaussians represent geometry and appearance details.
Training objectives focus on high-quality object-level reconstruction.
Experiments
Sparse-view Reconstruction
Comparison with baselines shows superior quality and speed for GRM.
Single Image-to-3D Generation
GRM outperforms baselines in quality metrics while maintaining fast inference speed.
Text-to-3D Generation
Competitive performance compared to optimization-based methods like MVDream.
Stats
GRMは、スパースビュー画像から約0.1秒で3Dアセットを回復する大規模な再構築モデルです。