The paper proposes a theory explaining how biological evolution can produce DAL abilities that are more efficient than reward-driven learning. The key idea is that reward-driven learning (RDL) provides a consistent learning process that evolution can then consolidate into more efficient DAL by gradually integrating non-reward information into the learning process.
The authors set up a computational model where a population of neural networks (NNs) learns a navigation task. The NNs use Reinforcement Learning (RL) as an approximation of RDL, and a neuromodulation (NM) mechanism to integrate non-reward information into the learning process.
The evolutionary dynamics observed in the model support the proposed theory. Initially, RL alone drives learning progress. Over generations, NM-based learning abilities gradually emerge and become the dominant learning mechanism, eliminating reliance on reward information altogether. The evolved DAL agents show a 300-fold increase in learning speed compared to pure RL agents, learning exclusively from non-reward information using local NM-based weight updates.
The authors analyze the learning process of the evolved agents, observing a transition from random trial-and-error to focused experimentation for gathering task-relevant information. They discuss the implications of their findings for understanding biological intelligence and developing more efficient AI learning algorithms.
Sang ngôn ngữ khác
từ nội dung nguồn
arxiv.org
Thông tin chi tiết chính được chắt lọc từ
by Solvi Arnold... lúc arxiv.org 04-22-2024
https://arxiv.org/pdf/2404.12631.pdfYêu cầu sâu hơn