Core Concepts
Proposing a Real-to-Simulation fine-tuning strategy for SAOM to improve multi-class multi-instance segmentation performance.
Abstract
The article introduces the Segment Anything Object Model (SAOM) and its Real-to-Simulation fine-tuning strategy for multi-class multi-instance segmentation. SAOM aims to provide whole object segmentation masks crucial for indoor scene understanding, especially in robotics applications. The proposed strategy involves using object images and ground truth data from the Ai2Thor simulator during fine-tuning. By implementing a novel nearest neighbor assignment method, SAOM significantly enhances performance compared to the foundational SAM model. The study evaluates SAOM on a dataset collected from Ai2Thor simulator, showcasing a 28% increase in mIoU and a 25% increase in mAcc for 54 indoor object classes. Additionally, the Real-to-Simulation fine-tuning approach demonstrates promising generalization performance in real environments without prior training on real-world data.
Stats
SAM can generalize well to natural images but has limitations in real-world applications [9].
SAOM shows a 28% increase in mIoU and a 25% increase in mAcc compared to SAM.
A total of 303937 object masks were collected from Ai2Thor simulator.
SAOM reduces the number of output masks by 81.6% compared to SAM.
Quotes
"SAOM significantly improves on SAM with a 28% increase in mIoU and a 25% increase in mAcc."
"Our Real-to-Simulation fine-tuning strategy demonstrates promising generalization performance in real environments."