Evaluating the Planning Abilities of Large Language Models Using GameTraversalBenchmark: Can LLMs Traverse 2D Game Maps Effectively?
Large language models (LLMs) struggle with planning tasks, as demonstrated by their performance on the GameTraversalBenchmark (GTB), which evaluates their ability to navigate 2D game maps, highlighting the need for further research to improve their planning capabilities.