Core Concepts
Numerical issues in empirical estimation of memory capacity in linear echo state networks lead to inaccurate results that contradict well-established theoretical bounds.
Abstract
The paper examines the problem of accurately estimating the memory capacity (MC) of linear echo state networks (LESNs), a type of recurrent neural network. It is shown that numerical evaluations of MC reported in the literature often contradict the theoretical upper bound of N, where N is the dimension of the state space.
The authors first provide background on the definition of MC and its relation to the Kalman controllability matrix. They then demonstrate that linear models generically have maximal memory capacity, i.e., MC = N, as long as the reservoir matrix A and input mask C satisfy certain algebraic conditions.
However, the paper identifies two main issues with standard numerical approaches for estimating MC:
Monte Carlo estimation: The sample-based estimator of MC is shown to be positively biased, especially for large lags τ. This is due to the ill-conditioning of the covariance matrices involved in the computation, leading to inaccurate results that overestimate the true MC.
Naive algebraic estimation: Direct algebraic computation of MC based on the Kalman controllability matrix also suffers from numerical instabilities, resulting in underestimation of the true MC.
To address these challenges, the authors propose robust numerical methods that exploit the Krylov structure of the controllability matrix and the neutrality of MC to the choice of input mask. These techniques, called the orthogonalized subspace method and the averaged orthogonalized subspace method, are shown to accurately recover the theoretical MC of N for linear echo state networks.
The paper concludes that many previous efforts to optimize the memory capacity of linear recurrent networks were afflicted by numerical pathologies and conveyed misleading results.
Stats
The memory capacity MCτ at lag τ is given by the formula:
MCτ = Cov(zt-τ, xt)Γx^(-1) Cov(xt, zt-τ) / Var(zt)
The total memory capacity MC is the sum of MCτ over all lags τ.
Quotes
"Numerical evaluations of the memory capacity (MC) of recurrent neural networks reported in the literature often contradict well-established theoretical bounds."
"We argue that the memory gap originates from pure numerical artifacts overlooked by many previous studies and propose robust techniques that allow for accurate estimation of the memory capacity, which renders full memory results for linear RNNs in agreement with the well-known theoretical results."