This paper explores the use of reinforcement learning (RL) to train defensive agents in a simulated Integrated Platform Management System (IPMS) environment under cyber attack, known as IPMSRL. The authors introduce three environment configurations of varying difficulty, with the hard configuration incorporating realistic dynamics such as false positive alerts, false negative alerts, and alert delays.
The authors first establish a baseline using a standard PPO RL agent, which struggles to perform well in the more difficult environment configurations. To address this, they explore two guided RL techniques:
Curriculum Learning (CL): The authors gradually increase the difficulty of the environment during training, allowing the agent to first learn in simpler scenarios before transitioning to more complex ones. This approach is shown to significantly improve the agent's performance, reaching a mean episode reward of -0.569 in the hard environment, compared to -2.791 for the baseline.
Action Masking (AM): The authors implement action masking to constrain the agent's available actions based on the current state of the environment, preventing it from taking undesirable or impossible actions. This technique also leads to substantial improvements, with the agent reaching a mean episode reward of -0.743 in the hard environment.
Finally, the authors combine CL and AM, which results in the highest level of performance observed, with a mean episode reward of 0.137 in the hard environment. This outperforms both the baseline and a hardcoded defensive agent, which achieved a mean episode reward of -1.895 in the hard environment.
The results demonstrate that the application of CL and AM, individually and in combination, can significantly enhance the data efficiency and overall performance of RL agents in the context of operational technology cyber security, where complex real-world dynamics need to be addressed.
toiselle kielelle
lähdeaineistosta
arxiv.org
Tärkeimmät oivallukset
by Alec Wilson,... klo arxiv.org 09-18-2024
https://arxiv.org/pdf/2409.10563.pdfSyvällisempiä Kysymyksiä