Core Concepts
Natural evolution strategies can be extended to discrete parameter spaces, allowing optimization of models with discrete parameters without the need for gradient computation.
Abstract
The paper introduces discrete natural evolution strategies (NES), an extension of the NES algorithm to handle discrete parameter spaces. NES is a class of black-box optimizers that estimate gradients through Monte Carlo sampling, without requiring explicit gradient computation.
The key insights are:
- For discrete parameter spaces, the natural gradient can be computed efficiently without the need for the inverse Fisher information matrix (FIM) computation, as the entropy (discrete equivalent of variance) is implicitly adjusted through the search gradient updates.
- The paper demonstrates the effectiveness of discrete NES on a program induction task, where discrete parameters like operators and conditionals are optimized alongside continuous parameters.
- Discrete NES is shown to outperform a variational optimization baseline in terms of training stability and convergence speed.
- An ablation study is performed to investigate the effect of the FIM, revealing that it is not necessary for discrete NES and can even be detrimental to performance.
The paper showcases the practical utility of discrete NES as a solver for optimization problems involving discrete parameters, paving the way for its application to more complex real-world tasks.
Stats
If x > 3.5:
return 4.2 * x
else:
return x * 2.1
Quotes
"NES is very similar to variational optimization (Staines & Barber, 2012) as both estimate the gradient as a Monte Carlo expectation. The major difference lies in the gradient of the search distribution: VO uses the gradient of the probability, while NES uses the score. This subtle difference results in VO computing a lower bound on the true objective, while NES computes an exact gradient in the limit of infinite samples."