A novel modality-adaptive Transformer (MAT) is proposed to effectively detect salient objects from inputs with arbitrary modalities by leveraging modality prompts to adaptively adjust the feature space and dynamically fusing unimodal features across different modalities.
다양한 모달리티 간 차이를 극복하고 동적 융합을 수행하여 임의의 모달리티에서 두드러진 객체를 효과적으로 탐지할 수 있는 모달리티 적응형 트랜스포머 모델을 제안한다.
The proposed Arbitrary Modality Salient Object Detection (AM SOD) model can effectively detect salient objects from input images of arbitrary modality types (e.g. RGB, depth, thermal) and arbitrary modality numbers (e.g. single-modal, dual-modal, triple-modal) using a single unified framework.