NavCoT enhances LLM-based navigation by simplifying action decisions through disentangled reasoning. It significantly improves performance on various benchmarks, showcasing the effectiveness of in-domain training and explicit reasoning generation.
Recent advancements in large language models (LLMs) have shown promise in Vision-and-Language Navigation (VLN). However, their offline use often leads to domain gap issues. NavCoT addresses this by enabling self-guided navigational decision-making through parameter-efficient in-domain training. By prompting the LLM to forecast the navigational chain-of-thought, NavCoT simplifies action prediction and improves interpretability. Experimental results demonstrate NavCoT's superiority over direct action prediction variants, showcasing its potential for developing scalable LLM-based embodied agents.
Through formalized labels and parameter-efficient finetuning, NavCoT surpasses high-cost LLM-based approaches with a significant relative improvement on VLN datasets like Room-to-Room (R2R). The method combines world model theory with Chain-of-Thought reasoning to enhance navigation performance and scalability while ensuring task adaptability and interpretability. Overall, NavCoT offers a promising approach for advancing real-world robotics applications.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Bingqian Lin... at arxiv.org 03-13-2024
https://arxiv.org/pdf/2403.07376.pdfDeeper Inquiries