Concepts de base
DeltaDQ is a novel compression framework that significantly reduces the memory footprint of fine-tuned large language models (LLMs) while maintaining accuracy, enabling the deployment of multiple models on resource-constrained hardware.
Stats
DeltaDQ achieves 16x compression with improved accuracy compared to baselines for WizardMath and WizardCoder models across different parameter scales.
DeltaDQ demonstrates the ability for ultra-high compression ratio, achieving 128x compression for the WizardMath-7B model and 512x compression for the WizardMath-70B model.
DeltaDQ surpasses the state-of-the-art accuracy for the WizardMath-7B and 13B models by 4.40 and 2.20, respectively.
DeltaDQ outperforms the original WizardCoder-7B and 13B models by 3.05 and 1.22, respectively.
With 32x compression on the WizardMath-7B model and 128x compression on the WizardMath-70B model, DeltaDQ improves accuracy over baselines by 6.60 and 0.83, respectively.