SPA is a versatile structured pruning framework that can prune neural networks with any architecture, from any framework, and at any stage of training, achieving state-of-the-art pruning results without the need for fine-tuning or calibration data.
Optimizing the sequence of multiple compression techniques significantly reduces computation costs with minimal accuracy loss.