Efficient Low-Precision Neural Networks: Mitigating Overlooked Inefficiencies
Quantization of both elementwise and multiply-accumulate operations is crucial for achieving efficient low-precision neural networks. Existing efficiency metrics overlook the substantial cost of non-quantized elementwise operations, leading to suboptimal model designs.