The paper proposes a theory explaining how biological evolution can produce DAL abilities that are more efficient than reward-driven learning. The key idea is that reward-driven learning (RDL) provides a consistent learning process that evolution can then consolidate into more efficient DAL by gradually integrating non-reward information into the learning process.
The authors set up a computational model where a population of neural networks (NNs) learns a navigation task. The NNs use Reinforcement Learning (RL) as an approximation of RDL, and a neuromodulation (NM) mechanism to integrate non-reward information into the learning process.
The evolutionary dynamics observed in the model support the proposed theory. Initially, RL alone drives learning progress. Over generations, NM-based learning abilities gradually emerge and become the dominant learning mechanism, eliminating reliance on reward information altogether. The evolved DAL agents show a 300-fold increase in learning speed compared to pure RL agents, learning exclusively from non-reward information using local NM-based weight updates.
The authors analyze the learning process of the evolved agents, observing a transition from random trial-and-error to focused experimentation for gathering task-relevant information. They discuss the implications of their findings for understanding biological intelligence and developing more efficient AI learning algorithms.
翻譯成其他語言
從原文內容
arxiv.org
深入探究