Mitigating Outlier Channels in Language Model Quantization with Activation Regularization
Outlier channels in language models emerge early in training and are prevalent in layers with residual streams. Regularizing both the input and output activations via quantization-aware training and kurtosis regularization can enable efficient 4-bit quantization of language models.