FlattenQuant: Efficient Post-Training Quantization for Large Language Models
FlattenQuant introduces a method for efficient post-training quantization in large language models, achieving significant speedup and memory reduction while maintaining accuracy.