Core Concepts
We present a memory-efficient neural 3D reconstruction method that uses an adaptive spherical binoctree data structure and signed distance functions to accurately reconstruct large-scale unbounded scenes from short egocentric omnidirectional video inputs.
Abstract
The paper introduces a novel method for 3D reconstruction of large-scale unbounded scenes from short egocentric omnidirectional video inputs. The key ideas are:
Representation: The geometry is represented using a neural signed distance function (SDF) within an adaptive spherical binoctree data structure. This allows efficient partitioning of the reconstruction space to focus sampling on areas with more detail.
Adaptive Subdivision: The spherical binoctree is adaptively subdivided during the optimization process, starting from a coarse initial structure and refining it based on the estimated SDF values. This enables the method to capture high-frequency details while maintaining memory efficiency.
Sampling Strategies: The paper proposes three levels of sampling: whole-space spherical sampling, coarse sphoxel (spherical voxel) sampling, and adaptive fine sphoxel sampling. This hierarchical sampling approach guides the optimization towards plausible surface regions.
Intersection and Surface Detection: Novel techniques are introduced to handle the irregular shape of sphoxels, including a ray-triangle intersection test and a surface existence test based on the SDF value and sphoxel radius.
The method is evaluated on both synthetic and real-world datasets, showing improved reconstruction accuracy and memory efficiency compared to state-of-the-art neural and classical 3D reconstruction approaches.
Stats
The number of voxels required for a dense regular Cartesian grid to achieve the same minimum voxel size as our adaptive spherical binoctree is significantly higher:
Sponza: 33,335,054,331 vs 4,346,041 voxels
Lone-monk: 2,234,638,740 vs 231,237 voxels
San Miguel: 1,953,273,076 vs 951,703 voxels
Quotes
"Our approach allows for more efficient and accurate 3D surface reconstruction of large-scale unbounded scenes using omnidirectional video inputs."
"We show that our method outperforms other state-of-the-art 3D reconstruction methods in balancing detail and memory cost."