Core Concepts
StepAgent, a novel framework for training Large Language Model (LLM) agents, leverages step-wise reinforcement learning to overcome the limitations of sparse reward signals in traditional methods, leading to more efficient and effective policy optimization.
Stats
StepAgent surpasses the state-of-the-art model ETO by an absolute value improvement of 2.9% on the HotpotQA dataset.
Eliminating step-wise reward in the StepAgent framework causes an obvious drop in performance across all tasks (e.g., WebShop: 68.0−→67.2 and Science World: 64.8−→63.6).