Understanding the importance of initialization and iterative pruning in neural networks.
The SMART pruner utilizes a separate, learnable probability mask to rank weight importance, employing a differentiable Top-k operator and dynamic temperature parameter trick to achieve target sparsity and escape non-sparse local minima, resulting in state-of-the-art performance on block and output channel pruning across various computer vision tasks and models.