Core Concepts
SPA is a versatile structured pruning framework that can prune neural networks with any architecture, from any framework, and at any stage of training, achieving state-of-the-art pruning results without the need for fine-tuning or calibration data.
Abstract
The paper introduces Structurally Prune Anything (SPA), a versatile structured pruning framework that can prune neural networks with any architecture, from any framework, and at any stage of training.
Key highlights:
SPA leverages a standardized computational graph and ONNX representation to prune diverse neural network architectures without the need for manual intervention.
SPA employs a group-level importance estimation method, which groups dependent computational operators, estimates their importance, and prunes unimportant coupled channels.
SPA supports pruning at any time, either before training, after training with fine-tuning, or after training without fine-tuning.
In the train-prune setting, the paper introduces Optimal Brain SPA (OBSPA), an algorithm that achieves state-of-the-art pruning results needing neither fine-tuning nor calibration data.
Extensive experiments show that SPA achieves competitive to state-of-the-art pruning performance across various architectures, from popular frameworks, at different pruning times.
Stats
The paper presents several key metrics to evaluate the pruning performance:
Reduction in floating point operations (FLOPs), denoted as RF
Reduction in parameters, denoted as RP
Quotes
"SPA leverages a standardized computational graph and ONNX representation to prune diverse neural network architectures without the need for manual intervention."
"SPA employs a group-level importance estimation method, which groups dependent computational operators, estimates their importance, and prunes unimportant coupled channels."
"In the train-prune setting, the paper introduces Optimal Brain SPA (OBSPA), an algorithm that achieves state-of-the-art pruning results needing neither fine-tuning nor calibration data."