Efficient LLM Inference using Custom Microscaling Formats: A Dataflow Compiler Approach
MASE, a novel compiler, automatically explores mixed-precision quantization using custom Microscaling (MX) formats to enable efficient dataflow hardware acceleration for large language models (LLMs) with minimal accuracy degradation.