Core Concepts
The authors propose a weakly supervised self-training approach to adapt the pre-trained Segment-Anything (SAM) model to diverse downstream segmentation tasks, overcoming the generalization issues of SAM under significant distribution shift.
Abstract
The authors identify the generalization issue of the Segment-Anything (SAM) model, a state-of-the-art image segmentation foundation model, when deployed on diverse downstream tasks such as medical images, camouflaged objects, and robotic images. To address this, they propose a weakly supervised self-training framework to adapt SAM without accessing the original source dataset.
Key highlights:
- The authors adopt a teacher-student self-training architecture, where the student network is updated using pseudo-labels from the teacher network. To prevent confirmation bias, an anchor network with frozen source weights is introduced to regularize the student updates.
- The authors leverage weak supervision in the form of bounding boxes, sparse points, and coarse segmentation masks as prompts, which are seamlessly integrated into the SAM model.
- A low-rank weight update strategy is employed to enable efficient adaptation of the large SAM encoder network.
- Extensive evaluations on 10 datasets across 5 types of downstream tasks demonstrate the effectiveness of the proposed weakly supervised adaptation approach, outperforming state-of-the-art domain adaptation methods.
The authors show that their weakly supervised adaptation method can significantly improve the generalization of SAM under various distribution shifts, making it more robust for deployment on diverse real-world segmentation tasks.
Stats
The authors report the following key metrics and figures:
mIoU scores on COCO-C dataset with various types of visual corruptions, ranging from 57.34% to 78.50% for the proposed method compared to 57.34% to 72.83% for the direct testing of pre-trained SAM.
mIoU scores on natural image datasets COCO and Pascal VOC, ranging from 62.09% to 80.12% for the proposed method compared to 54.76% to 74.29% for direct testing.
mIoU scores on medical image datasets kvasir-SEG and ISIC, ranging from 67.40% to 85.47% for the proposed method compared to 53.42% to 81.59% for direct testing.
mIoU scores on camouflaged object datasets CHAMELEON, CAMO, and COD10K, ranging from 45.87% to 75.94% for the proposed method compared to 39.37% to 66.32% for direct testing.
mIoU scores on robotic image datasets OCID and OSD, ranging from 77.41% to 92.11% for the proposed method compared to 71.41% to 87.62% for direct testing.
Quotes
"The authors are motivated by the challenge of deploying SAM on many downstream tasks and propose to adapt SAM to downstream segmentation tasks with weak supervision without requiring access to source domain training data."
"The proposed method is naturally compatible with weak supervisions which could substantially improve the efficacy of adaptation."
"Extensive evaluations on 10 datasets from 5 types of downstream tasks suggest the proposed adaptation method can significantly improve the generalization of SAM under various degrees of distribution shift."