Core Concepts
A novel framework - MonoSelfRecon - that achieves explicit 3D mesh reconstruction for generalizable indoor scenes with monocular RGB views by purely self-supervised training on voxel-SDF, without requiring any depth or SDF annotations.
Abstract
The content presents a novel framework called MonoSelfRecon that addresses the challenge of efficient and accurate 3D scene reconstruction from monocular RGB views. The key highlights are:
MonoSelfRecon is the first to achieve explicit 3D mesh reconstruction for generalizable indoor scenes using purely self-supervised training on voxel-SDF, without requiring any depth or SDF annotations.
The framework follows an Autoencoder-based architecture, decoding voxel-SDF and a generalizable Neural Radiance Field (NeRF), where the NeRF guides the voxel-SDF in self-supervision.
Novel self-supervised losses are proposed, which not only enable pure self-supervision, but can also be used together with supervised signals to further boost supervised training.
Experiments show that MonoSelfRecon trained in pure self-supervision outperforms current best self-supervised indoor depth estimation models and is comparable to 3D reconstruction models trained in full supervision with depth annotations.
The framework is not restricted to a specific model design, and can be used to extend any models with voxel-SDF for purely self-supervised 3D reconstruction.
Stats
The content does not provide any specific numerical data or statistics. The focus is on the novel framework and self-supervised training approach.
Quotes
The content does not contain any striking quotes that support the key logics.