Optimizing Datatype Formats for Accurate and Efficient Large Language Model Inference
Profiling of DNN weight and activation distributions reveals they are best approximated by Student's t-distributions, leading to the derivation of an optimal Student Float (SF4) datatype that improves model accuracy over existing formats. Supernormal support variants of E2M1 and APoT4 further enhance efficiency-accuracy tradeoffs.