The paper proposes a self-supervised learning approach for anomaly detection in aerial agricultural images using Masked Image Modeling. The key insights are:
Traditional supervised learning methods for anomaly detection face challenges in adapting to diverse anomalies, requiring extensive annotated data.
The authors overcome this limitation by leveraging a Masked Autoencoder (MAE) architecture, which extracts meaningful normal features from unlabeled image samples. This allows the model to detect anomalies based on high reconstruction errors for abnormal pixels.
To remove the need for using only "normal" data during training, the authors introduce an Anomaly Suppression Loss mechanism. This effectively minimizes the reconstruction of anomalous pixels, allowing the model to learn anomalous areas without explicitly separating "normal" images.
The authors use a Swin Transformer-based Masked Autoencoder (SwinMAE) to capture both local and global features, enabling robust anomaly detection across a wide range of anomaly types.
Evaluation on the Agriculture-Vision dataset shows a 6.3% mIOU score improvement over prior state-of-the-art unsupervised and self-supervised methods. The single SwinMAE model generalizes well across all the anomaly categories in the dataset.
In un'altra lingua
dal contenuto originale
arxiv.org
Approfondimenti chiave tratti da
by Sambal Shikh... alle arxiv.org 04-16-2024
https://arxiv.org/pdf/2404.08931.pdfDomande più approfondite