Quantization of Large Language Models

insight - Quantization of Large Language Models

The Unique Vulnerability of LLaMA3-70B Series to Per-Channel 8-bit Quantization

The LLaMA3-70B model series exhibits a unique vulnerability to per-channel 8-bit quantization, in contrast to other large language models that demonstrate robust performance under the same quantization scheme.

Efficient Quantization Methods for Large Language Models: Fundamentals, System Implementations, and Algorithmic Strategies

This survey provides a comprehensive overview of low-bit quantization methods for large language models, covering the fundamental principles, system implementations, and algorithmic approaches to enhance the efficiency and deployability of LLMs.

Effective Quantization of Large Language Models through Dual Transformation of Outlier Activations

The proposed DuQuant method effectively mitigates the impact of both massive and normal outlier activations in large language models through strategic rotation and permutation transformations, leading to substantial performance improvements in low-bit quantization scenarios.

Attention-aware Post-Training Mixed-Precision Quantization for Efficient Deployment of Large Language Models

APTQ (Attention-aware Post-Training Mixed-Precision Quantization) is a novel technique that leverages the nonlinear effects of attention outputs and second-order Hessian information to achieve high-quality quantization of large language models, enabling efficient deployment on edge devices.

Efficient Quantization of Large Language Models Using an Overdetermined Basis Representation

Kashin Quantization, a novel data quantization approach, can efficiently compress large language models while maintaining competitive or superior predictive performance.

Outlier-Free 4-Bit Inference in Rotated Large Language Models

QuaRot is a new quantization scheme that rotates large language models to remove outliers from the hidden state, enabling end-to-end 4-bit quantization of weights, activations, and KV cache without any high precision channels.

About

Products

Resources