핵심 개념
QuantTune proposes an outlier-driven fine-tuning method to address accuracy drops in post-training quantization of Transformer-based models. By adjusting weights based on outlier activations, it effectively mitigates the negative impact of outliers on model inference accuracy.
초록
QuantTune introduces a novel approach to optimize model quantization by addressing outliers and dynamic range issues. The method significantly reduces accuracy drops in quantized models, showcasing improvements across various Transformer architectures. By seamlessly integrating into the fine-tuning process, QuantTune offers hardware-independent solutions for efficient model compression and acceleration.
The study focuses on the challenges faced during post-training linear quantization of Transformer-based models. It reveals that precision loss due to outliers contributes significantly to quantization errors, leading to reduced inference accuracy. QuantTune adjusts weights based on outlier deviations to constrain dynamic ranges effectively, resulting in improved model performance after quantization.
The research demonstrates that managing activation outliers is crucial for accurate post-training quantization. By employing an outlier-driven loss function during fine-tuning, QuantTune successfully narrows dynamic ranges and minimizes precision errors caused by outliers. This approach enhances model resilience against quantization-induced errors without requiring complex hardware or extensive calibration efforts.
QuantTune's effectiveness is validated through experiments across various Transformer-based models like ViT, DeiT, Swin-Transformer, BERT, and OPT. The method outperforms state-of-the-art calibration methods by reducing accuracy drops at different bit-widths for both vision and language models. Additionally, QuantTune offers a cost-effective solution for model quantization with seamless integration into standard computing platforms.
통계
QuantTune reduces accuracy drops by 12.09% at 8-bit quantization and 33.8% at 7-bit compared to top calibration methods.
The method outperforms state-of-the-art solutions by over 18.84% across ViT models.
인용구
"Our approach showcases significant improvements in post-training quantization across a range of Transformer-based models."
"QuantTune reduces accuracy drops by over 18.84% compared to existing state-of-the-art methods."