Core Concepts
Efficiently draw samples from posterior distribution for robust reinforcement learning.
Abstract
The content introduces the LKTD algorithm, combining SGMCMC with Kalman filtering for deep RL. It addresses uncertainties in agent-environment interactions and dynamic changes in model parameters. The algorithm converges to a stationary distribution, enabling robust and adaptable RL approaches.
-
Abstract
- Introduces LKTD algorithm using SGMCMC.
- Convergence to stationary distribution for uncertainty quantification.
-
Introduction
- Successes of RL across diverse tasks.
- Traditional methods overlook stochasticity in interactions.
-
Kalman Temporal Difference Algorithms
- Treat values as random variables for tracking policy learning process.
-
Langevinized Kalman Temporal Difference Algorithm
- Overcomes limitations of existing KTD algorithms.
- Efficient value tracking and scalability for large-scale neural networks.
-
Experiments
- Comparison with DQN, BootDQN, QR-DQN in various environments.
- LKTD demonstrates superior performance in training rewards and optimal policy exploration.
-
Conclusion
- LKTD offers advancements in efficiency and precision for deep RL optimization.
Stats
Under mild conditions, we prove that the posterior samples generated by the LKTD algorithm converge to a stationary distribution.
Quotes
"The proposed algorithm enables fast value tracking at a complexity of O(np) per iteration."
"Compared to existing KTD algorithms, the proposed algorithm can directly handle a nonlinear function h(·, ·) without the need for a linearization operator."