CBQ introduces a cross-block reconstruction method for large language models, achieving superior low-bit quantization and outperforming existing methods.
An accurate and efficient low-bitwidth post-training quantization method, QLLM, is proposed to address the challenge of activation outliers in quantizing large language models.