แนวคิดหลัก
CRISP introduces a novel pruning framework combining fine-grained N:M and coarse-grained block sparsity to achieve high accuracy with superior hardware efficiency.
บทคัดย่อ
I. Abstract
- Machine learning pipelines aim to enhance computational efficiency by tailoring models to focus on user-specific classes.
- Existing works rely on unstructured or structured pruning, but they may lead to reduced model accuracy.
- CRISP proposes a hybrid structured sparsity pattern for efficient model pruning.
II. Introduction
- Personalizing models for individual users can reduce resource requirements and enhance efficiency.
- Unstructured pruning is inefficient unless the model is highly sparse.
- Existing class-aware pruning methods struggle with maintaining high accuracy at high sparsity levels.
III. CRISP Pruning Framework
A. Hybrid Structured Sparsity Pattern
- Combines fine-grained N:M and coarse-grained block sparsity.
- Ensures uniform load balancing and efficient hardware implementation.
B. Method Overview
- Three-step iterative pruning process: class-aware fine-tuning, N:M pruning, and block pruning.
- Prevents layer collapse by gradually increasing global model sparsity over multiple iterations.
C. Pruning Algorithm
- Fine-tunes the model iteratively based on class-aware saliency scores.
- Utilizes straight-through estimator for N:M fine-grained pruning and uniform block pruning.
D. Pruning Metric
- Measures weight importance based on gradient flow using user-preferred classes.
- Criteria include large weight magnitude with small gradients, small weight magnitude with large gradients, and both small weight magnitude and small gradients.
E. Accelerator-level Design
- Introduces CRISP-STC hardware accelerator design inspired by NVIDIA's Sparse Tensor Core (STC).
- Achieves up to 30× energy efficiency compared to the dense model.
IV. Experimental Evaluation
A. Experimental Methodology
- Evaluates memory-intensive networks on ImageNet dataset using various configurations of CRISP versus baselines OCAP and CAPNN.
B. Experimental Results
- CRISP consistently outperforms block pruning in terms of accuracy at higher sparsity levels.
V. Conclusion
CRISP offers an efficient solution for class-aware personalization through structured sparsity patterns, achieving high accuracy with superior hardware efficiency.
สถิติ
この研究では、CRISPは最大14倍のレイテンシー削減と30倍のエネルギー効率を達成しました。