Core Concepts
SceneSense is a real-time 3D diffusion model that predicts occluded geometries for future planning and control in robotics.
Abstract
SceneSense introduces a real-time 3D diffusion model for synthesizing 3D occupancy information from partial observations.
The framework uses a running occupancy map and a single RGB-D camera to generate predicted geometry around the platform at runtime.
By preserving observed space integrity, SceneSense mitigates the risk of corrupting observed areas with generative predictions.
The method enhances local occupancy predictions around the platform, showing better representation of ground truth occupancy distribution than the running occupancy map.
The architecture allows for extension to additional modalities beyond a single RGB-D camera.
I. Introduction:
Humans rely on 'common sense' inferences, while robots are limited to decisions based on directly measured data like cameras or lidar.
SceneSense aims to address this gap by predicting out-of-view or occluded geometry using generative AI methods.
II. Related Works:
Semantic Scene Completion (SSC) focuses on generating dense semantically labeled scenes from sparse representations in target areas.
Generative 3D Scene Synthesis involves constructing 3D scenes from multiple camera views using Neural Radiance Fields (NeRFs).
III. Preliminaries and Problem Definition:
Dense Occupancy Prediction aims to predict occupancy values for every voxel in a target region between 0 and 1.
IV. Method:
The architecture includes denoising networks, feature extraction, conditioning, and occupancy mapping for accurate predictions.
V. Experiments:
Habitat lab simulation platform and HM3D dataset are used for training and testing data generation.
VI. Results:
Quantitative comparisons show that SceneSense outperforms baseline methods in terms of FID and KID metrics.
VII. Conclusions and Future Work:
SceneSense presents a promising approach for generative local occupancy prediction in robotics applications.
Stats
SceneSenseは、実行時にプラットフォーム周囲の予測ジオメトリを生成します。
SceneSenseは、観測された空間の整合性を保持し、ジェネレーティブ予測による観測領域の破損リスクを軽減します。
SceneSenseは、プラットフォーム周囲の局所占有予測を向上させ、ランニング占有マップよりも地面真実占有分布をより良く表現することが示されています。