IM-Unpack: Unpacking Integer Matrices for Efficient Training and Inference in Transformer Models
Core Concepts
Unpacking integer matrices enables efficient training and inference in Transformer models.
Abstract
The article discusses the use of low bit-width integers for General Matrix Multiply (GEMM) operations in deep learning, focusing on unpacking matrices with heavy hitter entries to achieve efficiency gains. It introduces the IM-Unpack algorithm to address this issue and simplify hardware support for low bit-width integer GEMMs.
Structure:
Abstract:
GEMM efficiency is crucial in deep learning.
Low bit-width integers can approximate matrix entries but face challenges with heavy hitters.
Introduction:
Importance of numerical precision in GEMM operations.
Efficiency gains through low bit-width GEMMs.
Efficacy of Integers:
Quantization strategies for inference efficiency.
Efficacy of Integers: Training:
Evaluation of quantized training using RTN method.
What happens with Low Bit-Width?:
Challenges with representing heavy hitters using low bit-width integers.
IM-Unpack: Integer Matrix Unpacking:
Algorithm to unpack matrices into low bit-width integers efficiently.
Evaluating Unpacking Overhead:
Analysis of computational cost increase due to unpacking matrices.
Conclusion:
Verification of efficacy of integer GEMMs and introduction of IM-Unpack algorithm.
IM-Unpack
Stats
"GEneral Matrix Multiply (GEMM) is a central operation in deep learning."
"A popular strategy is the use of low bit-width integers to approximate the original entries in a matrix."
"For many popular models, the overhead of additional operations when using IM-Unpack is quite small."