Data-Regularised Environment Design for Zero-Shot Transfer in Reinforcement Learning
Data-Regularised Environment Design (DRED) generates new training levels using a generative model to approximate the context distribution, while employing adaptive level sampling to minimize the mutual information between the agent's internal representation and the training level identities. This enables DRED to achieve significant improvements in zero-shot transfer performance compared to existing adaptive sampling and unsupervised environment design methods.