Evaluating the Performance of Low-Bit Quantized LLAMA3 Models: An Empirical Study
This empirical study comprehensively evaluates the performance of low-bit quantized LLAMA3 models across a range of post-training quantization and LoRA-finetuning techniques, revealing significant challenges in maintaining model accuracy at ultra-low bit-widths.