Optimizing Convolutional Neural Network Accelerators through Mixed-Precision Quantization and Hardware-Aware Mapping
Enabling rich mixed-precision quantization schemes during the implementation of a CNN can open a previously hidden space of mappings that utilize the hardware resources more effectively than uniformly quantized layers accompanied by standard mappings. CNNs utilizing quantized weights and activations and suitable mappings can significantly improve trade-offs among the accuracy, energy, and memory requirements compared to less carefully optimized CNN implementations.