The paper proposes a novel Arbitrary Modality Salient Object Detection (AM SOD) task, which aims to detect salient objects from input images of arbitrary modality types (e.g. RGB, depth, thermal) and arbitrary modality numbers (e.g. single-modal, dual-modal, triple-modal) using a single model. This is in contrast to existing salient object detection (SOD) models that are designed or trained for specific modality types and numbers.
The key challenges addressed are: 1) Modality discrepancies in unimodal feature extraction - how to adaptively extract discriminative features from arbitrary modalities using a single feature extractor; and 2) Dynamic inputs in multi-modal feature fusion - how to dynamically fuse unimodal features from an arbitrary number of modalities.
To tackle these challenges, the paper proposes a Modality Switch Network (MSN) with two main components:
Additionally, the paper introduces a new AM SOD dataset, AM-XD, to facilitate research in this area. Extensive experiments demonstrate the effectiveness of the proposed MSN in handling arbitrary input modalities and numbers for robust salient object detection.
To Another Language
from source content
arxiv.org
Principais Insights Extraídos De
by Nianchang Hu... às arxiv.org 05-07-2024
https://arxiv.org/pdf/2405.03352.pdfPerguntas Mais Profundas