핵심 개념
Enhancing arithmetic reasoning in Large Language Models through query-dependent prompt optimization using Offline Inverse RL.
초록
Study aims to improve arithmetic reasoning in Large Language Models (LLMs) through query-dependent prompt optimization.
Introduces Prompt-OIRL using offline inverse reinforcement learning.
Highlights challenges in prompt evaluation and optimization.
Demonstrates cost-efficiency and effectiveness of Prompt-OIRL.
Validates approach across various LLMs and arithmetic reasoning datasets.
통계
"Our method optimizes prompt during inference on a query-dependent level effectively and cost-efficiently."
"The optimal prompt is chosen without LLM interaction, ensuring only the chosen prompt undergoes inference."
"Prompt-OIRL utilizes the offline reward model to pinpoint the most suitable prompt."
인용구
"Our method optimizes prompt during inference on a query-dependent level effectively and cost-efficiently."