The paper introduces Memory Gym, a benchmark suite of 2D partially observable environments designed to challenge the memory capabilities of decision-making agents. The authors expand the original finite versions of these environments into innovative, endless formats, inspired by cumulative memory games like "I packed my bag". This progression in task design shifts the focus from merely assessing sample efficiency to probing the levels of memory effectiveness in dynamic, prolonged scenarios.
The authors contribute an open-source implementation of Transformer-XL (TrXL) integrated with Proximal Policy Optimization (PPO) as a memory-enhanced DRL baseline. Their comparative study between TrXL and a recurrent Gated Recurrent Unit (GRU) agent reveals surprising findings. While TrXL demonstrates superior sample efficiency on the finite Mystery Path environment and effectiveness on Mortar Mayhem, GRU consistently outperforms TrXL by significant margins across all endless tasks.
The authors explore several hypotheses to understand TrXL's underperformance in endless environments, including network capacity, hyperparameter tuning, and the impact of advantage normalization. They also falsify their initial assumption that recurrent agents are vulnerable to spotlight perturbations in Searing Spotlights, emphasizing the importance of proper hyperparameter settings.
toiselle kielelle
lähdeaineistosta
arxiv.org
Tärkeimmät oivallukset
by Marco Pleine... klo arxiv.org 09-19-2024
https://arxiv.org/pdf/2309.17207.pdfSyvällisempiä Kysymyksiä