Core Concepts
Representational drift, the gradual change in neural tuning over time even in constant environments, arises from continuous learning in the presence of noise, leading to a directed movement within the low-loss manifold towards a flatter area.
Abstract
The content explores the phenomenon of representational drift, where the tuning of individual neurons changes over time even in constant environments. The authors propose that this drift is a consequence of continuous learning in the presence of noise, and can be modeled using artificial neural networks.
The key insights are:
Training a neural network on a predictive coding task leads to the development of spatially-tuned units, similar to place cells in the hippocampus. Continued training of the network results in a gradual sparsification of the neural activity, with individual units becoming more informative.
This sparsification and increase in tuning specificity is consistent with experimental observations in the CA1 region of the hippocampus, where the number of active place cells decreases while their spatial information content increases over time.
The authors connect this sparsification effect to changes in the Hessian of the loss function, in accordance with recent machine learning theory. Specifically, the network's movement towards a flatter area of the loss landscape leads to a reduction in the number of non-zero eigenvalues of the Hessian, resulting in sparser representations.
The authors propose that the learning process can be divided into three overlapping phases: (i) fast familiarity with the environment, (ii) slow implicit regularization leading to directed drift, and (iii) a steady state of null drift.
The authors demonstrate the generality of this phenomenon by systematically varying the task, activation function, and learning rule, and showing that the sparsification dynamics are robust to these changes, except for the case of label noise.
The authors suggest that the statistics of representational drift can be used to infer the learning rule implemented by the network, as different noise statistics lead to different implicit regularizations.
Stats
"The network quickly converged to a low loss and stayed at the same loss during the additional training period (Fig 2B)."
"The fraction of active units decreased slowly while their tuning specificity increased (Fig 2C)."
"The correlation matrix of the rate maps over time showed a gradual change that slowed down (Fig 2E)."
"All datasets are consistent with our simulations - namely that the fraction of active cells reduces while the mean SI per cell increases over a long timescale (Fig 3)."
Quotes
"Representational drift has been suggested to be a consequence of continuous learning under noise, but its properties are still not fully understood."
"We conclude that learning is divided into three overlapping phases: (i) Fast familiarity with the environment; (ii) slow implicit regularization; (iii) a steady state of null drift."
"The variability in drift dynamics opens the possibility of inferring learning algorithms from observations of drift statistics."