Core Concepts
State Space Models offer a promising alternative to the attention mechanism in neural networks, showing superior performance in long-context tasks.
Abstract
I. Introduction
Growing interest in integrating linear state-space models (SSM) in deep neural network architectures.
Success of Mamba surpassing Transformer architectures in language tasks.
Foundation models like GPT-4 encode sequential data into a latent space for learning compressed representations.
II. State Space Models
A. Learning setup
Foundation models map input and output signals using parameters θ.
Different parameterizations render the problem tractable.
B. Parametrization
Continuous-time linear system dynamics with complex-valued matrices A, B, C, D.
C. Discretization
Discrete-time version of the system used for implementation.
D. Structure and Initialization
Importance of initialization, particularly matrix A's impact on performance.
E. Implementation
Efficient learning and deployment strategies discussed.
F. Scaffolding and Layers
Pre-processing and post-processing operations essential for model performance.
III. Overview of SSM Proposals
Various architectural choices and considerations discussed.
IV. Performance Comparison on LRA Benchmark
SSMs outperform Transformers in long-context tasks.
V. Concluding Remarks
Potential of SSMs to enhance foundation models highlighted.
Stats
SSMは、Long Range Arena(LRA)ベンチマークなどの長いコンテキストタスクでTransformersを上回る性能を示しています。