The paper proposes a Traveling Wave based Working Memory (TWM) model that stores working memory variables as waves of neural activity propagating through a neural substrate. The key insights are:
The TWM model can represent any history-dependent dynamical system (HDS) by encoding the past state history as traveling waves. The waves propagate through the neural substrate, with the start boundary condition computing the next state based on the entire history.
Two specific cases of the TWM model are analyzed - Linear Boundary Condition (LBC) and Self-Attention Boundary Condition (SBC). The LBC case is shown to be equivalent to the dynamics of Recurrent Neural Networks (RNNs), providing a new perspective on how RNNs may be encoding working memory. The SBC case provides a justification for the autoregressive computations in Transformer architectures.
Empirical analysis of trained RNNs reveals that they converge to the TWM encoding, with the hidden states exhibiting traveling wave patterns that store the recent past. A basis transformation can reveal the underlying TWM structure in the trained RNN parameters.
The TWM framework also explains how it can alleviate the diminishing gradient problem in RNNs by removing the need to propagate gradients backwards in time.
Overall, the paper provides a theoretical foundation for understanding working memory mechanisms in both biological and artificial neural networks, and suggests potential avenues for improving neural network architectures.
翻譯成其他語言
從原文內容
arxiv.org
深入探究