Self-Supervised Vision-Action Pre-Training for Visual Navigation
In this work, the authors propose a Self-Supervised Vision-Action Model for Visual Navigation Pre-Training (VANP) to focus on navigation-relevant visual regions. By leveraging self-supervision signals and Transformer Encoders, VANP maximizes information between embeddings to enhance navigation performance.