Core Concepts
iSeg is a data-driven interactive technique for 3D shape segmentation that generates tailored partitions of the shape according to user clicks.
Abstract
The paper presents iSeg, a new interactive technique for segmenting 3D shapes. Previous works have focused mainly on leveraging pre-trained 2D foundation models for 3D segmentation based on text, which may be insufficient for accurately describing fine-grained spatial segmentations. Moreover, achieving a consistent 3D segmentation using a 2D model is challenging since occluded areas of the same semantic region may not be visible together from any 2D view.
iSeg's key components are:
Mesh Feature Field (MFF): A function that embeds each mesh vertex into a deep feature vector, distilling semantic information from a pre-trained 2D foundation model.
Interactive Attention Layer: A novel attention mechanism that can handle a variable number of user clicks, both positive and negative, to indicate regions to include or exclude.
3D Segmentation Decoder: A network that takes the MFF and the user clicks and predicts the corresponding mesh segmentation.
The training of iSeg leverages the 2D foundation model to provide supervision, but the final segmentation is computed directly in 3D, ensuring view-consistency. iSeg is shown to be highly versatile, working on a variety of shapes and geometries, and faithful to the user's specifications, outperforming alternative interactive 3D segmentation methods.