מושגי ליבה
NavCoT introduces a novel strategy for Vision-and-Language Navigation (VLN) by enabling self-guided navigational decision-making through disentangled reasoning, leading to significant performance improvements.
תקציר
The content introduces NavCoT, a strategy for VLN that enhances navigational decision-making through disentangled reasoning. It addresses the domain gap between VLN tasks and large language models, showcasing superior performance over direct action prediction variants. The method involves training LLMs to generate navigational chain-of-thought outputs, improving interpretability and scalability in embodied agents.
- Introduction to Vision-and-Language Navigation (VLN)
- Role of Large Language Models (LLMs) in VLN tasks
- Challenges in utilizing LLMs for navigation decisions
- Introduction of NavCoT strategy for parameter-efficient in-domain training
- Explanation of Navigational Chain-of-Thought concept and its components
- Experimental results showcasing superiority over direct action prediction variants
- Contributions of NavCoT in enhancing interpretability and scalability of LLM-based agents
סטטיסטיקה
この論文は、NavCoTがR2RデータセットでGPT4ベースのアプローチを7%相対的に上回ることを示しています。
NavCoTは、直接の行動予測バリアントよりも優れた性能を示しました。