NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention
NoMAD-Attention proposes an efficient algorithm for LLM inference on CPUs by replacing MAD operations with in-register lookups, achieving significant speedups without sacrificing model quality.