The proposed hierarchical neural radiance (HNR) representation model produces multi-level semantic features for future environments, enabling efficient lookahead exploration and improved navigation planning in continuous vision-language navigation tasks.
The core message of this paper is that using top-down semantic maps as the main input for generating navigation instructions is a feasible approach, and incorporating additional information such as region names, actions, and prompts can further improve the quality of the generated instructions.
Utilizing Large Language Models (LLMs) with the TINA framework enhances zero-shot navigation capabilities in Vision-Language Navigation tasks.