Efficient Compression of Large Language Model Weights Using Variable Precision, Variable Range Numeric Data Types
This paper presents a framework for compressed, variable precision, variable range, user-defined numerical data types that provide flexibility in the computation of neural networks, enabling significant bandwidth reduction and storage savings for the weights of large language models and convolutional neural networks.