Alapfogalmak
Proposing π-KRVI for efficient handling of large state-action spaces with order optimal regret guarantees.
Kivonat
Kernelized Reinforcement Learning aims to efficiently handle large state-action spaces using reproducing kernel Hilbert spaces. The proposed π-KRVI policy leverages domain partitioning and kernel ridge regression to achieve sublinear regret bounds. The analysis focuses on complex models with polynomial eigendecay kernels like Matérn kernels. Confidence intervals are crucial in designing and analyzing RL algorithms, ensuring the effectiveness of π-KRVI in various settings.
Statisztikák
A regret bound of ˜O(H2T d+α/2 d+α) is achieved.
Maximum information gain: Γk,λ(T) = O(T1/˜p log(T)1−1/˜p ρα˜pZ).
Covering number: log Nk,h(ϵ; R, B) = O(R2ραZϵ2/(˜p−1)(1 + log(R/ϵ)) + B2ραZϵ2/(˜p−1)(1 + log(B/ϵ))).
Idézetek
"We propose π-KRVI, an optimistic modification of least-squares value iteration."
"Our results show a significant improvement over the state of the art in handling large state-action spaces."
"The regret bound is sublinear and order optimal for Matérn kernels."