Concepts de base
Verifying stochastic RL policies using model checking is effective and versatile.
Résumé
The content introduces a method to verify stochastic reinforcement learning policies using model checking. It explains the compatibility with any RL algorithm as long as the Markov property is adhered to. The method integrates model checking with RL, leveraging a Markov decision process, a trained RL policy, and probabilistic computation tree logic (PCTL) formula. The approach is demonstrated across multiple benchmarks, showing its suitability for verifying stochastic RL policies. The content also discusses related work, background on probabilistic model checking, reinforcement learning, methodology, experiments, analysis, and conclusion.
Stats
Our method yields precise results (see Crazy Climber).
The deterministic estimation technique exhibits faster performance.
Naive monolithic model checking results are bounds and do not reflect the actual RL policy performance.
Citations
"Our method is evaluated across various RL benchmarks and compared to an alternative approach that only builds the part of the MDP that is reachable via the highest probability actions and an approach called naive monolithic model checking."
"In contrast, the deterministic estimation technique exhibits faster performance."
"The model checking result for this safety measurement yielded P(♦goal) = 0.7, indicating that the agent has a 70% chance of safely reaching the other side of the road."