Accumulator-Aware Post-Training Quantization: Enabling Low-Precision Inference for Large Neural Networks
This work introduces AXE, a practical framework of accumulator-aware extensions that endow overflow avoidance guarantees to layer-wise post-training quantization algorithms, enabling low-precision inference for large neural networks.