Causality-based Cross-Modal Representation Learning for Vision-and-Language Navigation
The author proposes a novel CausalVLN framework based on causal learning to enhance the generalization capabilities of navigators by addressing biased associations and confounders in vision-and-language tasks.