Bibliographic Information: Yin, H., Xu, X., Wu, Z., Jie, Z., & Lu, J. (2024). SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation. arXiv preprint arXiv:2410.08189.
Research Objective: This paper introduces SG-Nav, a novel framework designed to address the limitations of existing zero-shot object navigation methods by leveraging the reasoning capabilities of Large Language Models (LLMs) and the rich contextual information provided by 3D scene graphs.
Methodology: SG-Nav constructs an online hierarchical 3D scene graph that captures spatial relationships between objects, groups, and rooms. This graph is incrementally updated as the agent explores the environment. The framework employs a hierarchical chain-of-thought prompting technique to interact with the LLM, enabling it to reason about the goal location based on the scene context. Additionally, a graph-based re-perception mechanism is implemented to address potential perception errors by evaluating the credibility of detected objects.
Key Findings: Evaluations conducted on MP3D, HM3D, and RoboTHOR environments demonstrate that SG-Nav significantly outperforms state-of-the-art zero-shot object navigation methods, achieving a success rate improvement of over 10% on all benchmarks. Notably, SG-Nav even surpasses the performance of some supervised methods on the challenging MP3D dataset.
Main Conclusions: The integration of 3D scene graphs and hierarchical chain-of-thought prompting enables LLMs to effectively reason about spatial relationships and make informed decisions for zero-shot object navigation. The proposed graph-based re-perception mechanism enhances the robustness of the framework by mitigating the impact of perception errors.
Significance: SG-Nav presents a significant advancement in zero-shot object navigation by demonstrating the potential of combining LLMs with structured scene representations. The framework's explainability through the summarization of LLM reasoning processes holds promise for enhancing human-agent interaction in navigation tasks.
Limitations and Future Research: The reliance on online 3D instance segmentation for scene graph construction presents a limitation, as current methods are not fully end-to-end or 3D-aware. Future research could explore the development of more robust and efficient 3D scene understanding techniques. Additionally, extending SG-Nav to handle a wider range of navigation tasks, such as image-goal navigation and vision-and-language navigation, presents promising avenues for future work.
إلى لغة أخرى
من محتوى المصدر
arxiv.org
الرؤى الأساسية المستخلصة من
by Hang Yin, Xi... في arxiv.org 10-11-2024
https://arxiv.org/pdf/2410.08189.pdfاستفسارات أعمق