DEFA introduces a novel method, combining frequency-weighted pruning and probability-aware pruning to reduce memory footprint by over 80%. It explores multi-scale parallelism to boost throughput significantly. Extensively evaluated, DEFA achieves substantial speedup and energy efficiency improvements compared to GPUs. The architecture utilizes fine-grained layer fusion and feature map reusing to further reduce memory access. DEFA's approach addresses the inefficiencies of MSDeformAttn on general-purpose platforms like CPUs and GPUs. It provides pioneering support for MSDeformAttn in object detection tasks.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Yansong Xu,D... at arxiv.org 03-19-2024
https://arxiv.org/pdf/2403.10913.pdfDeeper Inquiries