QuantTune successfully mitigates the negative impact of outliers on quantized models, showcasing significant improvements in accuracy.
Outlier-driven fine-tuning method, QuantTune, effectively mitigates the negative impact of outliers on inference accuracy in quantized Transformer-based models.
QuantTune proposes an outlier-driven fine-tuning method to address accuracy drops in post-training quantization of Transformer-based models. By adjusting weights based on outlier activations, it effectively mitigates the negative impact of outliers on model inference accuracy.