The paper presents a new large-scale benchmark dataset, RRSIS-D, designed for the RRSIS task. RRSIS-D comprises 17,402 remote sensing image-caption-mask triplets, offering a significant improvement in scale and diversity compared to previous datasets.
The authors propose the Rotated Multi-Scale Interaction Network (RMSIN) to tackle the unique challenges of RRSIS, which include complex spatial scales and object orientations in aerial imagery. RMSIN incorporates an Intra-scale Interaction Module (IIM) to effectively extract fine-grained details at multiple scales, and a Cross-scale Interaction Module (CIM) to integrate these details coherently across the network. Additionally, RMSIN employs an Adaptive Rotated Convolution (ARC) in the decoder to account for the diverse orientations of objects, a novel contribution that significantly enhances segmentation accuracy.
Experimental evaluations on the RRSIS-D dataset demonstrate the exceptional performance of RMSIN, surpassing existing state-of-the-art models by a significant margin. The authors also conduct extensive ablation studies to validate the effectiveness of the proposed components, including the IIM, CIM, and ARC.
In eine andere Sprache
aus dem Quellinhalt
arxiv.org
Wichtige Erkenntnisse aus
by Sihan Liu,Yi... um arxiv.org 04-03-2024
https://arxiv.org/pdf/2312.12470.pdfTiefere Fragen