The paper discusses the issue of steady-state errors that often arise when using quadratic reward functions in reinforcement learning (RL). While absolute-value-type reward functions can alleviate this problem, they tend to induce substantial fluctuations in specific system states, leading to abrupt changes.
To address this challenge, the study proposes an approach that integrates an integral term into quadratic-type reward functions. This modification helps the RL algorithm better consider the reward history, consequently alleviating concerns related to steady-state errors.
The paper first introduces the basic theory of reinforcement learning and the Deep Deterministic Policy Gradient (DDPG) algorithm. It then outlines two methods for incorporating the integral term into the reward function:
Method 1: The accumulated error is only summed up after a certain threshold time, allowing the use of relatively large weight values to reduce steady-state errors while improving convergence speed.
Method 2: The accumulated error is weighted by a sigmoid function over time, gradually increasing the emphasis on steady-state errors to accelerate convergence.
The proposed methods are validated through experiments on the Adaptive Cruise Control (ACC) and lane change models. The results demonstrate that the PI-based quadratic reward functions effectively diminish steady-state errors without causing significant spikes in certain system states, outperforming both the quadratic and absolute value reward functions.
翻譯成其他語言
從原文內容
arxiv.org
深入探究