IceFormer is a novel method that accelerates the inference of long-sequence Transformers on CPUs by leveraging a sparse attention mechanism without requiring model retraining.