The content introduces a novel Bilevel-LLM framework that combines prompt questions and CoT reasoning to enhance decision-making with large language models. The framework outperforms baselines in complex environments like Overcooked and FourRoom by selecting appropriate prompts and interpreting CoT outputs effectively.
Large language models (LLMs) demonstrate their potential in solving complex tasks by combining action-based policies with Chain-of-Thought (CoT) reasoning. The quality of prompts is crucial for effective performance, prompting the development of a new leader-follower bilevel framework. This framework incorporates human prior knowledge into action policy learning, resulting in superior decision-making capabilities.
The proposed framework aims to automatically generate prompts for decision-making tasks, reducing the need for extensive human labor. By integrating CoT reasoning and reinforced action policies, the Bilevel-LLM outperforms existing baselines across various challenging environments.
Key metrics or figures:
In eine andere Sprache
aus dem Quellinhalt
arxiv.org
Wichtige Erkenntnisse aus
by Xue Yan,Yan ... um arxiv.org 03-01-2024
https://arxiv.org/pdf/2310.18127.pdfTiefere Fragen