Core Concepts
Stable reparameterization techniques can enable state-space models to stably approximate nonlinear functionals with polynomial decaying memory, overcoming the "curse of memory" limitation of state-space models without reparameterization.
Abstract
The paper investigates the long-term memory learning capabilities of state-space models (SSMs) from the perspective of parameterization. The key findings are:
Theorem 3.3 proves that state-space models without any reparameterization exhibit a memory limitation similar to that of traditional RNNs - the target relationships that can be stably approximated by state-space models must have an exponential decaying memory. This "curse of memory" arises from the recurrent weights converging to a stability boundary.
To address this issue, the paper introduces a class of stable reparameterization techniques for SSMs that effectively lift its memory limitations. Theorem 3.5 shows that with stable reparameterization, SSMs can stably approximate any nonlinear functionals with decaying memory, including those with polynomial decay.
Beyond the approximation benefits, the paper also analyzes the impact of different parameterizations on the optimization stability. Theorem 3.6 characterizes the relationship between gradient norms and recurrent weight parameterization. Based on this, the paper proposes an "optimal" reparameterization scheme that maintains a bounded gradient-over-weight ratio, enhancing the training stability of large SSM models.
Numerical experiments on synthetic tasks, language modeling, and image classification validate the theoretical findings, demonstrating the advantages of stable reparameterization in both approximation and optimization.
Stats
The paper does not provide specific numerical data to support the claims. The key insights are derived from theoretical analysis.
Quotes
"We prove that similar to RNNs, the state-space models without reparameterization can only stably approximate targets with exponential decaying memory."
"We identify a class of stable reparameterization which achieves the stable approximation of any nonlinear functionals."
"We propose the gradient boundedness as the criterion and show the gradients are bounded by a form that depends on the parameterization."