Khái niệm cốt lõi
Mapping visual observations to actional atomic concepts enhances navigation performance in Vision-Language tasks.
Tóm tắt
In the study, Actional Atomic-Concept Learning (AACL) is introduced to bridge the semantic gap between visual observations and language instructions in Vision-Language Navigation (VLN). AACL maps visual observations to actional atomic concepts, simplifying alignment and improving interpretability. The method consists of concept mapping, a concept refining adapter, and an observation co-embedding module. Experiments on VLN benchmarks show that AACL achieves state-of-the-art results by enhancing observation features and simplifying alignment.
Thống kê
AACL establishes new state-of-the-art results on VLN benchmarks.
AACL outperforms baseline agents HAMT and DUET on various metrics.
The temperature parameter τ is set to 0.5 for object concept mapping.
The learning rate of the concept refining adapter is set to 0.1.
Trích dẫn
"AACL simplifies the multi-modal alignment and distinguishes different observation candidates easily."
"AACL significantly improves interpretability in action decision."