Core Concepts
Swarm behaviors can be evolved by minimizing the surprise or prediction error of the swarm robots' sensor values, without the need for a task-specific reward function.
Abstract
The paper presents an in-depth analysis of the "minimize surprise" approach for evolving swarm behaviors. In simple 2D grid simulations, the authors show that this approach can lead to the emergence of diverse self-assembly behaviors, such as lines, pairs, squares, and aggregation. The key idea is to equip each swarm robot with an actor-predictor pair of artificial neural networks, where the predictor network is trained to minimize the prediction error of the robot's sensor values. This task-independent fitness function allows the evolution of a variety of behaviors as a byproduct, without the need to specify a task-specific reward function.
The authors analyze the emergent behaviors in detail, including their prediction accuracy, the relationship between predictions and formed structures, and the effectiveness of the minimize surprise approach compared to random search. They also show that the minimize surprise approach is competitive with the divergent search method of novelty search in terms of generating behavioral diversity.
Furthermore, the authors demonstrate that the minimize surprise approach can be scaled up to more realistic simulations and even real-world multi-robot experiments, where they evolve basic swarm behaviors and object manipulation behaviors. Overall, the paper provides a comprehensive overview of the minimize surprise approach and its advantages in evolving diverse and effective swarm behaviors.
Stats
"At least 71% of the sensor values were predicted correctly by the evolved predictors."
"The median best fitness in the last generation ranges from 0.71 (L = 15) to 0.93 (L = 29)."
"The best evolved individuals reach significantly better solution quality than the selected random individuals for L ≥19."
Quotes
"Approaches of so-called innate motivation try to avoid the specific formulation of rewards and work instead with different drivers, such as curiosity."
"The minimization of free energy can then be achieved by an organism either adjusting its actions, such that they lead to sensor values matching the predictions, or optimizing the internal world model or predictor."
"A unique advantage of the swarm robot case is that swarm members populate the robot's environment and can trigger more active behaviors in a self-referential loop."