Tiny models can efficiently reduce computational demands for large models through the innovative TinySaver approach.
T3DNet proposes a two-stage method for compressing 3D point cloud models, achieving high compression rates without significant accuracy sacrifice.
Lossless and tunable lossy compression techniques can significantly reduce the storage and network bandwidth requirements of large foundation models without compromising their accuracy.
This paper introduces a novel model compression technique called Convex Distillation, which leverages convex optimization to compress large, non-convex deep neural networks into smaller, more efficient convex networks, achieving comparable performance while eliminating the need for post-compression fine-tuning on labeled data.
DeltaDQ is a novel compression framework that significantly reduces the memory footprint of fine-tuned large language models (LLMs) while maintaining accuracy, enabling the deployment of multiple models on resource-constrained hardware.
Self-data distillation, a novel fine-tuning technique leveraging the original unpruned model to generate a distilled dataset, effectively mitigates quality degradation in pruned large language models, outperforming standard supervised fine-tuning methods.
SGLP is a novel layer pruning method that leverages representation similarity and efficient importance evaluation to compress large deep learning models, achieving a balance between model size, computational efficiency, and performance.
This research paper introduces a novel post-training pruning method for foundation models that outperforms existing techniques by directly formulating and solving the Multiple Removal Problem (MRP), enabling simultaneous pruning of multiple weights and achieving higher accuracy without retraining.