The paper presents a novel method for clustering and ranking reinforcement learning policy decisions to improve the interpretability of trained policies. The key contributions are:
A black-box method based on statistical covariance estimation that clusters the states of the environment and ranks each cluster according to the importance of decisions made in its states. This extends previous work on ranking individual policy decisions using spectrum-based fault localization techniques.
The clustered decision ranking provides insight into the operation of the policy and describes the importance of states more accurately than individual statistical ranking.
Experiments on MiniGrid and Atari environments show that the pruned policies, which only use the top-ranked clusters without retraining, can maintain high performance compared to the original policy. This addresses the high complexity of trained reinforcement learning policies, a key hurdle for their wider adoption.
The clustering of states itself provides important insights into the relationships between particular decisions and the overall performance of the policy.
The authors argue that this clustering-based ranking method is a potentially useful technique in the context of explainable reinforcement learning, as it enables the construction of simpler yet high-performing policies without the need for retraining.
На другой язык
из исходного контента
arxiv.org
Дополнительные вопросы