Efficient Support for Large Language Models Through FP6-Centric Algorithm-System Co-Design
Six-bit quantization (FP6) enhances LLM efficiency by reducing model size while maintaining quality. The TC-FPx design scheme enables unified Tensor Core support for various quantization bit-widths.