Core Concepts
Jetfire proposes an efficient and accurate INT8 training method for transformers, optimizing memory access and maintaining accuracy through per-block quantization.
Stats
Our method offers an end-to-end training speedup of 1.42x compared to the FP16 baseline.
Quotes
"Our method features an INT8 data flow to optimize memory access."
"Per-block quantization brings practical training speedup on tensor cores."