Core Concepts
The author introduces the Evolution Transformer, a causal Transformer architecture for Evolution Strategies, trained through Evolutionary Algorithm Distillation to optimize in-context learning. The approach aims to discover powerful optimization principles via meta-optimization.
Abstract
The Evolution Transformer is designed to flexibly represent an ES for different population sizes and search space dimensions. It leverages self-attention and Perceiver cross-attention to induce population-order invariance and dimension-order equivariance. Through supervised training with EAD, it clones various BBO algorithms and performs well on unseen tasks.
Key points:
- Introduction of the Evolution Transformer for ES.
- Training through Evolutionary Algorithm Distillation.
- Flexibility in representing ES for different settings.
- Leveraging self-attention and Perceiver modules.
- Successful cloning of various BBO algorithms.
- Performance on unseen tasks after training.
Stats
Evolution Transformer robustly outperforms Evolutionary Optimization baselines.
Aggregated results on 8 Brax tasks and 5 individual runs reported.
Population size used: π = 128 for Brax tasks.
Quotes
"Evolutionary optimization algorithms struggle with leveraging information obtained during optimization."
"Evolution Transformer outputs performance-improving updates based on trajectory evaluations."
"EvoTF generalizes to previously unseen optimization problems."