The content introduces a novel Bilevel-LLM framework that combines prompt questions and CoT reasoning to enhance decision-making with large language models. The framework outperforms baselines in complex environments like Overcooked and FourRoom by selecting appropriate prompts and interpreting CoT outputs effectively.
Large language models (LLMs) demonstrate their potential in solving complex tasks by combining action-based policies with Chain-of-Thought (CoT) reasoning. The quality of prompts is crucial for effective performance, prompting the development of a new leader-follower bilevel framework. This framework incorporates human prior knowledge into action policy learning, resulting in superior decision-making capabilities.
The proposed framework aims to automatically generate prompts for decision-making tasks, reducing the need for extensive human labor. By integrating CoT reasoning and reinforced action policies, the Bilevel-LLM outperforms existing baselines across various challenging environments.
Key metrics or figures:
In un'altra lingua
dal contenuto originale
arxiv.org
Approfondimenti chiave tratti da
by Xue Yan,Yan ... alle arxiv.org 03-01-2024
https://arxiv.org/pdf/2310.18127.pdfDomande più approfondite