ONNXPruner is a versatile pruning adapter designed to streamline the application of pruning algorithms across diverse deep learning models and hardware platforms by leveraging the ONNX format.
The core message of this paper is to propose a differentiable quantization strategy search (DQSS) framework that can automatically assign optimal quantization strategy for individual layer of a deep neural network, by taking advantages of the benefits of different quantization algorithms, in order to keep the performance of quantized models as high as possible.
This survey provides a comprehensive overview of methods and techniques for designing efficient, lightweight deep learning models that can be deployed on resource-constrained devices, including mobile phones and microcontrollers. It covers the evolution of lightweight CNN architectures, compression techniques like pruning and quantization, and hardware acceleration strategies to enable effective deployment on edge devices.
Bias Compensation (BC) directly minimizes the output error caused by quantization, enabling ultra-low-precision quantization without model fine-tuning.