Core Concepts
The authors reveal that Mamba models can be reformulated as a form of causal self-attention, linking them to transformer layers efficiently.
Abstract
The study explores the attention mechanism in Mamba models, comparing them to transformers. It introduces novel explainability techniques for interpreting Mamba models. The experiments demonstrate the effectiveness of these methods in understanding and utilizing the attention mechanisms within Mamba models.
The research sheds light on the hidden attention matrices within Mamba models, providing insights into their structure and functionality. By leveraging these matrices, new explainability tools are developed for interpreting and analyzing Mamba models. The study also compares the performance of different explanation methods across both positive and negative perturbation scenarios, highlighting the strengths and weaknesses of each approach.
Stats
Selective State Space Layers have shown remarkable performance in various applications [23].
Mamba models offer a 5x increase in throughput compared to Transformers for auto-regressive generation.
The equivalence between recurrent and convolution provides efficient training with sub-quadratic complexity.
Vision-Mamba outperforms ViT in terms of accuracy and efficiency.
Raw-Attention method shows better AUC values for positive perturbation compared to Transformer.
Attn-Rollout method performs well in mean Average Precision for both Vision-Mamba and Transformer.
Transformer-based methods consistently outperform Mamba in negative perturbation scenarios.
Raw-Attention method demonstrates higher pixel accuracy compared to Vision Transformer.
Attn-Rollout method achieves better results than Vision Transformer in mean Average Precision.
Transformer Attribution consistently surpasses Mamba Attribution across various metrics.
Quotes
"Attention is not Explanation." - Jain & Wallace (2019)