The authors reveal that Mamba models can be reformulated as a form of causal self-attention, linking them to transformer layers efficiently.
Mamba models can be viewed as attention-driven models, shedding light on their inner workings and comparison to transformers.