A new referring remote sensing image segmentation method, FIANet, that leverages fine-grained image-text alignment to capture discriminative multi-modal representations, outperforming state-of-the-art approaches.
The core message of this paper is to introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed to effectively address the challenges of Referring Remote Sensing Image Segmentation (RRSIS), which involves segmenting specific areas from aerial images based on textual descriptions.
Referring Remote Sensing Image Segmentation (RRSIS) introduces a novel task within remote sensing, addressing the need for segmenting objects with linguistic guidance.