Core Concepts
Introducing Softmax-free Transformers for efficient visual recognition tasks.
Abstract
Softmax-free Linear Transformers propose a novel approach to self-attention in ViTs, enabling linear complexity and improved computational efficiency. The method replaces softmax normalization with a Gaussian kernel, allowing for full self-attention matrix approximation under low-rank matrix decomposition. Extensive experiments show superior trade-off between accuracy and complexity compared to existing ViT variants.
Stats
Softmax-based self-attention has quadratic complexity.
SOFT significantly improves computational efficiency.
SOFT enables longer token sequences with linear complexity.
Quotes
"Existing methods are either theoretically flawed or empirically ineffective for visual recognition."
"Our SOFT models can take in much longer image token sequences with linear complexity."