核心概念
DEFA proposes algorithm-architecture co-design for efficient MSDeformAttn acceleration through pruning and parallel processing.
要約
DEFA introduces a novel method, combining frequency-weighted pruning and probability-aware pruning to reduce memory footprint by over 80%. It explores multi-scale parallelism to boost throughput significantly. Extensively evaluated, DEFA achieves substantial speedup and energy efficiency improvements compared to GPUs. The architecture utilizes fine-grained layer fusion and feature map reusing to further reduce memory access. DEFA's approach addresses the inefficiencies of MSDeformAttn on general-purpose platforms like CPUs and GPUs. It provides pioneering support for MSDeformAttn in object detection tasks.
統計
DEFA achieves 10.1-31.9× speedup compared to powerful GPUs.
DEFA reduces memory footprint by over 80%.
DEFA rivals related accelerators by 2.2-3.7× energy efficiency improvement.
引用
"DEFA adopts frequency-weighted pruning and probability-aware pruning for feature maps and sampling points respectively, alleviating the memory footprint by over 80%."
"Extensively evaluated on representative benchmarks, DEFA achieves 10.1-31.9× speedup and 20.3-37.7× energy efficiency boost compared to powerful GPUs."
"DEFA is the first work to exploit sparsity in fmaps and sampling points and to explore the multi-scale parallelism in MSGS processing."