Leveraging Large Language and Vision Models for Efficient Exploration and Object Goal Navigation in Unfamiliar 3D Environments
A framework that leverages the reasoning capabilities of Large Language Models (LLMs) and Large Vision Language Models (LVLMs) to efficiently explore and navigate an unfamiliar 3D environment in search of a target object, by constructing a semantically rich and goal-oriented 3D scene representation.