Kernelized Reinforcement Learning aims to efficiently handle large state-action spaces using reproducing kernel Hilbert spaces. The proposed π-KRVI policy leverages domain partitioning and kernel ridge regression to achieve sublinear regret bounds. The analysis focuses on complex models with polynomial eigendecay kernels like Matérn kernels. Confidence intervals are crucial in designing and analyzing RL algorithms, ensuring the effectiveness of π-KRVI in various settings.
Naar een andere taal
vanuit de broninhoud
arxiv.org
Belangrijkste Inzichten Gedestilleerd Uit
by Sattar Vakil... om arxiv.org 03-15-2024
https://arxiv.org/pdf/2306.07745.pdfDiepere vragen