Efficient Quantized Matrix Multiplication for Accelerating Large-Scale Generative Language Models
LUT-GEMM, an efficient kernel for quantized matrix multiplication, eliminates the resource-intensive dequantization process and reduces computational costs compared to previous kernels for weight-only quantization, enabling substantial acceleration of token generation latency in large-scale generative language models.