The paper proposes a novel Arbitrary Modality Salient Object Detection (AM SOD) task, which aims to detect salient objects from input images of arbitrary modality types (e.g. RGB, depth, thermal) and arbitrary modality numbers (e.g. single-modal, dual-modal, triple-modal) using a single model. This is in contrast to existing salient object detection (SOD) models that are designed or trained for specific modality types and numbers.
The key challenges addressed are: 1) Modality discrepancies in unimodal feature extraction - how to adaptively extract discriminative features from arbitrary modalities using a single feature extractor; and 2) Dynamic inputs in multi-modal feature fusion - how to dynamically fuse unimodal features from an arbitrary number of modalities.
To tackle these challenges, the paper proposes a Modality Switch Network (MSN) with two main components:
Additionally, the paper introduces a new AM SOD dataset, AM-XD, to facilitate research in this area. Extensive experiments demonstrate the effectiveness of the proposed MSN in handling arbitrary input modalities and numbers for robust salient object detection.
Vers une autre langue
à partir du contenu source
arxiv.org
Questions plus approfondies