Deep Learning Hardware Acceleration

Optimizing Convolutional Neural Network Accelerators through Mixed-Precision Quantization and Hardware-Aware Mapping

Enabling rich mixed-precision quantization schemes during the implementation of a CNN can open a previously hidden space of mappings that utilize the hardware resources more effectively than uniformly quantized layers accompanied by standard mappings. CNNs utilizing quantized weights and activations and suitable mappings can significantly improve trade-offs among the accuracy, energy, and memory requirements compared to less carefully optimized CNN implementations.

Optimizing Convolutional Neural Network Accelerators through Mixed-Precision Quantization and Hardware-Aware Mapping

PQA: Accelerating Deep Neural Networks with Product Quantization on Custom Hardware