Core Concepts
Large Language Models can be leveraged as Reinforcement Learning agents to optimize workflows through iterative prompting and policy learning.
Abstract
The paper introduces a framework for formulating Reinforcement Learning (RL) problems as prompting tasks for Large Language Models (LLMs). The key contributions are:
- An iterative prompting strategy to communicate RL problem requirements to LLMs, including Markov Decision Process (MDP) elements like states, actions, and rewards.
- Integration of episode generation and simulation into the prompting chain to enable LLM-based policy learning through Q-Learning.
- Demonstration of the approach through two case studies - optimizing the workflows of a Research Scientist and a Legal Matter Intake process.
The authors first define the RL problem in MDP terms and translate it into a series of prompts for the LLM. The LLM is then iteratively prompted to simulate episodes, learn the Q-table, and extract the optimal policy. The approach leverages the reasoning and problem-solving capabilities of LLMs to align them for RL task optimization.
The case studies show that the framework can effectively optimize the workflows, with the LLM converging to the optimal policy within 2 iterations on average. The authors discuss the potential to handle more complex RL problems as the capabilities of LLMs continue to grow, as well as opportunities to integrate the approach with multimodal LLMs for more comprehensive workflow planning.
Stats
The workflow for a research scientist consists of the following states: Start, Initiate Research, Literature Review, Experiment Plan, Experiment Execution, Data Analysis, Manuscript Drafting, Submission to Venue, Revision of Manuscript, Peer Review, Result Publication, and End.
The possible actions from each state are defined, e.g., from the Start state, the only possible action is Initiate Research.
A reward of -1 is assigned for every state except the End state, which receives a reward of 0.
The workflow for legal matter intake consists of the following states: Start, Matter Intake, Conflict Assessment, Initial Assessment, Client Communication, Fee and Payment, Proposal Preparation, Proposal Review, Case Management, Billing, and End.
The possible actions from each state are defined, e.g., from the Matter Intake state, the possible actions are Conflict Assessment and Initial Assessment.
A reward of -1 is assigned for every state except the End state, which receives a reward of 0.
Quotes
"Large Language Models (LLMs) encapsulate an extensive amount of world knowledge, and this has enabled their application in various domains to improve the performance of a variety of Natural Language Processing (NLP) tasks."
"We propose a new framework that leverages the reasoning and problem-solving capabilities of LLMs to align them for RL problem-solving."