The content introduces a novel Bilevel-LLM framework that combines prompt questions and CoT reasoning to enhance decision-making with large language models. The framework outperforms baselines in complex environments like Overcooked and FourRoom by selecting appropriate prompts and interpreting CoT outputs effectively.
Large language models (LLMs) demonstrate their potential in solving complex tasks by combining action-based policies with Chain-of-Thought (CoT) reasoning. The quality of prompts is crucial for effective performance, prompting the development of a new leader-follower bilevel framework. This framework incorporates human prior knowledge into action policy learning, resulting in superior decision-making capabilities.
The proposed framework aims to automatically generate prompts for decision-making tasks, reducing the need for extensive human labor. By integrating CoT reasoning and reinforced action policies, the Bilevel-LLM outperforms existing baselines across various challenging environments.
Key metrics or figures:
翻譯成其他語言
從原文內容
arxiv.org
深入探究