QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven Fine Tuning
Core Concepts
Outlier-driven fine-tuning method, QuantTune, effectively mitigates the negative impact of outliers on inference accuracy in quantized Transformer-based models.
Abstract
Transformer-based models face challenges in post-training quantization, leading to accuracy drops. QuantTune addresses this by adjusting weights based on outlier activations to constrain dynamic ranges. It seamlessly integrates into fine-tuning processes without extra complexity.
QuantTune
Stats
QuantTune reduces accuracy drops by 12.09% at 8-bit quantization and 33.8% at 7-bit compared to top calibration methods.
Our approach showcases significant improvements in post-training quantization across a range of Transformer-based models.
Quotes
"Our study focuses on uncovering the underlying causes of these accuracy drops and proposing a quantization-friendly fine-tuning method, QuantTune."
"QuantTune adjusts weights based on the deviation of outlier activations and effectively constrains the dynamic ranges of the problematic activations."