Core Concepts
PosSAM integrates SAM and CLIP for robust open-vocabulary panoptic segmentation.
Abstract
PosSAM introduces an open-vocabulary panoptic segmentation model that combines SAM's spatial features with CLIP's semantic features.
The model addresses SAM's limitations in generating class and instance-aware masks.
It leverages a Local Discriminative Pooling (LDP) module and a Mask-Aware Selective Ensembling (MASE) algorithm for improved classification.
Extensive experiments demonstrate state-of-the-art performance across various datasets.
Results show significant improvements over existing methods in both COCO to ADE20K and ADE20K to COCO settings.
Stats
PosSAMは、COCOからADE20KへのPQで2.4、ADE20KからCOCOへのPQで4.6の大幅な改善を示しました。