Core Concepts
NavCoT introduces a novel strategy for Vision-and-Language Navigation (VLN) by enabling self-guided navigational decision-making through disentangled reasoning, leading to significant performance improvements.
Abstract
The content introduces NavCoT, a strategy for VLN that enhances navigational decision-making through disentangled reasoning. It addresses the domain gap between VLN tasks and large language models, showcasing superior performance over direct action prediction variants. The method involves training LLMs to generate navigational chain-of-thought outputs, improving interpretability and scalability in embodied agents.
Introduction to Vision-and-Language Navigation (VLN)
Role of Large Language Models (LLMs) in VLN tasks
Challenges in utilizing LLMs for navigation decisions
Introduction of NavCoT strategy for parameter-efficient in-domain training
Explanation of Navigational Chain-of-Thought concept and its components
Experimental results showcasing superiority over direct action prediction variants
Contributions of NavCoT in enhancing interpretability and scalability of LLM-based agents
Stats
この論文は、NavCoTがR2RデータセットでGPT4ベースのアプローチを7%相対的に上回ることを示しています。
NavCoTは、直接の行動予測バリアントよりも優れた性能を示しました。