Towards Highly Efficient 2-bit Post-Training Uniform Quantization of Large Models via Decoupling Parameters into Integer and Floating Points
decoupleQ achieves a substantial increase in model accuracy, especially at very low bits, by abandoning the traditional heuristic quantization paradigm and decoupling the model parameters into integer and floating-point parts, transforming the quantization problem into a constrained optimization problem.