Mamba-ND: Multi-Dimensional Data Modeling with State Space Models
Core Concepts
Mamba-ND extends state space models to multi-dimensional data efficiently.
Abstract
The content discusses Mamba-ND, a model that extends state space models to multi-dimensional data. It compares Mamba-ND with other alternatives like Bi-directional LSTMs and S4ND. The design of Mamba-ND involves unraveling input data across different dimensions in row-major orderings. Extensive comparisons show competitive performance on various benchmarks like ImageNet-1K classification, action recognition, weather forecasting, and 3D segmentation.
Directory:
Introduction
Transformers vs. State Space Models like Mamba.
ImageNet-1K Comparison
Mamba-ND outperforms ViT with fewer parameters.
Video Action Recognition
Extension of Mamba-2D to Mamba-3D for video tasks.
Global Weather Forecasting
Performance comparison with Cli-ViT using ERA5 data.
3D Medical Image Segmentation
Evaluation on BTCV dataset against UNETR and Swin-UNETR.
Meta Architectures Study
Ablation studies on layer designs and scan factorization techniques.
Effective Receptive Field Analysis
Visualization of ERF for different model designs.
Depths versus Widths Discussion
Importance of depth over width in model performance.
Mamba-ND
Stats
Mamba demonstrates +3.8 accuracy improvement compared to ViT-B on ImageNet-1K while reducing parameter count by 20.7%.
Quotes
"Unlike convolution or self-attention operations, which can be computed in parallel across the ND input data, Mamba requires a specific ordering of the data."
"In this work, we conducted an extensive study on these possible design choices."
"Mamba consistently outperforms transformers with fewer parameters."