TaylorShift introduces a novel approach to self-attention, enabling full token-to-token interactions in linear time and space. The method enhances memory efficiency and accelerates inference for long sequences, without compromising accuracy.
Introducing Softmax-free Transformers for efficient visual recognition tasks.
Self-attention complexity shifted to linear efficiency with TaylorShift.
TaylorShift introduces a novel reformulation of the Taylor softmax, enabling full token-to-token interactions in linear time and space, enhancing memory efficiency and accelerating inference for long sequences.