Core Concepts
SA3D, an efficient solution that leverages the Segment Anything Model (SAM) and radiance fields, can segment 3D objects by taking prompts from a single 2D view and iteratively refining the 3D mask across multiple views.
Abstract
The paper proposes Segment Anything in 3D (SA3D), an efficient solution for 3D segmentation that integrates the Segment Anything Model (SAM) with radiance fields.
Key highlights:
SA3D takes prompts (e.g., click points) from a single 2D view as input and uses SAM to generate a 2D segmentation mask.
SA3D then alternates between two steps - mask inverse rendering and cross-view self-prompting - to iteratively refine the 3D mask of the target object across multiple views.
Mask inverse rendering projects the 2D mask into the 3D space using the density distribution learned by the radiance field.
Cross-view self-prompting automatically extracts reliable prompts from the rendered 2D mask of the current 3D mask for a new view, which are then used as input to SAM to generate a more accurate 2D mask.
The iterative process continues until the 3D mask is refined across all necessary views.
SA3D can adapt to various radiance field representations, including vanilla-NeRF, TensoRF, and 3D-GS, and achieves high-quality 3D segmentation within seconds.
Experiments on multiple datasets demonstrate the effectiveness and efficiency of SA3D in various segmentation settings, including object-level, part-level, and text-prompted segmentation.
Stats
SA3D-TensoRF (with feature cache) achieves 92.2% mIoU and 98.5% mAcc on the NVOS dataset.
SA3D-GS (with feature cache) achieves 93.2% mIoU and 99.1% mAcc on the SPIn-NeRF dataset.
SA3D-TensoRF (with feature cache) achieves 82.7% mIoU on the Replica dataset.
Quotes
"SA3D adapts to various scenes and achieves 3D segmentation within seconds."
"Our research reveals a potential methodology to lift the ability of a 2D segmentation model to 3D."