Core Concepts
The authors develop a model-free test to assess the stationarity of the optimal Q-function based on historical data, enabling policy optimization in nonstationary environments.
Abstract
The content discusses testing stationarity and change point detection in reinforcement learning. It introduces a novel test for assessing the stationarity of the optimal Q-function and proposes a sequential change point detection method. The paper highlights theoretical contributions, simulation studies, and real data examples to illustrate the effectiveness of the proposed procedures.
Key points include:
Offline reinforcement learning methods in nonstationary environments are explored.
A model-free test is developed to assess the stationarity of the optimal Q-function based on pre-collected historical data.
The paper presents various types of stationarity assumptions and analyzes their interrelationships.
Methodological contributions such as hypothesis testing and change point detection are discussed.
The proposed test statistics, estimation of the Q-function, bootstrap approach to critical value, and change point detection procedure are detailed.
Consistency of the test is examined under different assumptions related to reward functions, transition functions, behavior policies, optimal policies, and basis functions.
The size and power properties of the proposed test are analyzed under specific conditions.
Stats
"A Python implementation of the proposed procedure is available at https://github.com/limengbinggz/CUSUM-RL."
"Keywords and phrases: Reinforcement learning, Nonstationarity, Hypothesis testing, Change point detection."