Concepts de base
A novel approach called DIG3D that marries Gaussian splatting with a deformable transformer to efficiently and accurately reconstruct 3D objects from a single RGB image.
Résumé
The paper proposes a novel method called DIG3D for 3D reconstruction and novel view synthesis from a single RGB image. The key highlights are:
-
DIG3D utilizes an encoder-decoder framework that generates 3D Gaussians in the decoder with the guidance of depth-aware image features from the encoder.
-
The method introduces the use of a deformable transformer in the decoder, allowing efficient and effective decoding through 3D reference point and multi-layer refinement adaptations.
-
By harnessing the benefits of 3D Gaussians, DIG3D offers an efficient and accurate solution for 3D reconstruction from single-view images. It outperforms recent methods like Splatter Image on the ShapeNet SRN dataset.
-
The paper makes two key adaptations to the DETR framework to handle 3D Gaussians effectively: 1) projecting the center of each 3D Gaussian onto the image plane as a reference point, and 2) updating the 3D Gaussian parameters using specific operations in the multi-layer refinement process.
-
Experiments on the ShapeNet SRN dataset demonstrate the superiority of DIG3D in terms of rendering quality, 3D geometry reconstruction, and inference speed compared to state-of-the-art methods.
Stats
"Our method surpasses all the methods shown in the table for both chairs and cars."
"Our approach produces smoother and more meaningful results. For instance, in cases where one chair leg obstructs another, Splatter Image still renders the leg behind, as illustrated in Figure 4. In contrast, our method accurately captures the occlusion and generates a more realistic rendering."
"When we filter out the 50% lowest opacity points, most of the background points in the input view are removed, resulting in a waste of Gaussian points. In contrast, our method ensures that all Gaussians contribute to the 3D object. The geometry of our objects is nearly accurate, and removing low opacity points does not compromise the overall 3D structure."
Citations
"Our method exhibits a substantial improvement in performance compared to Splatter Image .When comparing Table 1 and Table 3, our model shows minimal decrease in metrics when removing the values of the 8 views near the input view. However, the Splatter Image dataset exhibits a notable decrement in performance."
"This comparison provides evidence that our approach performs better, particularly when it comes to generating novel views that far from input views."