The paper presents a novel 3D-ConvSST architecture for efficient hyperspectral image (HSI) classification. The key highlights are:
3D-Convolution Guided Residual Module (CGRM): This module uses a 3D-Convolution layer between Transformer encoder blocks to fuse spectral and spatial information, enhancing feature propagation.
Global Average Pooling: Instead of using a class token, the model applies global average pooling on the final visual tokens to effectively encode discriminative high-level features for classification.
Extensive experiments on three public HSI datasets (Houston, MUUFL, Botswana) demonstrate the superiority of the proposed 3D-ConvSST over state-of-the-art traditional, convolutional, and Transformer-based models in terms of overall accuracy, average accuracy, and kappa coefficient.
Qualitative analysis shows that the 3D-ConvSST provides the best classification maps with improved spatial-spectral characterization compared to other methods.
Ablation studies validate the importance of both the CGRM and global average pooling modules in the 3D-ConvSST architecture.
The optimal depth of Transformer encoders varies across datasets, with Houston preferring a shallower model and MUUFL/Botswana benefiting from deeper models.
Til et annet språk
fra kildeinnhold
arxiv.org
Viktige innsikter hentet fra
by Shyam Varaha... klokken arxiv.org 04-23-2024
https://arxiv.org/pdf/2404.13252.pdfDypere Spørsmål