Core Concepts
The author proposes memory-based adapters to empower offline models with online perception ability by inserting plug-and-play modules and finetuning on RGB-D videos.
Abstract
Memory-based adapters are introduced to enhance temporal modeling in online 3D scene perception tasks. By caching and aggregating features over time, the proposed framework achieves leading performance compared to state-of-the-art methods on various datasets.
Content includes a detailed explanation of the proposed framework, its components, and how they are integrated into existing models. The approach is validated through extensive experiments on ScanNet and SceneNN datasets, showcasing significant improvements in semantic segmentation, object detection, and instance segmentation tasks.
Key points include the need for online 3D scene perception due to streaming RGB-D video inputs, the design of memory-based adapters for image and point cloud backbones, and the successful application of these adapters to boost performance across different tasks.
The study highlights the importance of temporal information in processing 3D scenes frame by frame and leveraging memory mechanisms for efficient aggregation. The results demonstrate superior performance compared to both offline and online methods without requiring model-specific designs or additional loss functions.
Stats
We propose a general framework for online 3D scene perception.
Our approach achieves leading performance on three 3D scene perception tasks.
Extensive experiments on ScanNet and SceneNN datasets demonstrate our approach's success.
Our method significantly boosts accuracy compared to state-of-the-art online methods.
Equipped with our memory-based adapters, offline models achieve better performance on online tasks.