Belangrijkste concepten
Enabling rich mixed-precision quantization schemes during the implementation of a CNN can open a previously hidden space of mappings that utilize the hardware resources more effectively than uniformly quantized layers accompanied by standard mappings. CNNs utilizing quantized weights and activations and suitable mappings can significantly improve trade-offs among the accuracy, energy, and memory requirements compared to less carefully optimized CNN implementations.
Samenvatting
The paper proposes a framework for optimizing the implementation of convolutional neural networks (CNNs) on hardware accelerators. The key components are:
-
Mapping engine: The authors extend the Timeloop tool to support mixed-precision quantization, allowing the exploration of a wider design space of CNN-to-hardware mappings. This enables better utilization of the hardware resources.
-
Training engine: The authors use quantization-aware training (QAT) in PyTorch to retrain the quantized CNN models and recover accuracy loss.
-
Search engine: The authors use a multi-objective genetic algorithm (NSGA-II) to find Pareto-optimal configurations that balance the CNN error, energy consumption, and memory requirements.
The experiments are conducted on two CNN models (MobileNetV1 and MobileNetV2) and two hardware accelerators (Eyeriss and Simba). The results show that the proposed method can achieve up to 37% energy savings without any accuracy drop, compared to less carefully optimized implementations.
The key insights are:
- Mixed-precision quantization enables a larger space of valid mappings, leading to more efficient utilization of the hardware.
- The synergy between quantization and mapping is crucial for optimizing the trade-offs between accuracy, energy, and memory.
- The proposed framework can be used to guide the design of new hardware accelerators for CNNs.
Statistieken
The number of valid mappings for the second convolutional layer of MobileNet on Eyeriss and Simba accelerators:
Eyeriss:
16-bit weights and activations: 11,778 valid mappings
8-bit weights and activations: 15,021 valid mappings
8-bit weights, 4-bit activations: 15,054 valid mappings
4-bit weights and activations: 16,417 valid mappings
Simba:
16-bit weights and activations: 80,835 valid mappings
8-bit weights and activations: 110,032 valid mappings
8-bit weights, 4-bit activations: 111,090 valid mappings
4-bit weights and activations: 127,214 valid mappings
Citaten
"Enabling rich mixed quantization schemes during the implementation can open a previously hidden space of mappings that utilize the hardware resources more effectively."
"CNNs utilizing quantized weights and activations and suitable mappings can significantly improve trade-offs among the accuracy, energy, and memory requirements compared to less carefully optimized CNN implementations."