Core Concepts
The authors explore efficient convolutional layer mapping on the OpenEdgeCGRA, highlighting the superiority of direct convolution with weight parallelism in terms of energy efficiency and latency.
Abstract
The study delves into optimizing convolutional layers on the OpenEdgeCGRA for edge computing. It compares different mapping techniques, emphasizing the benefits of direct convolution with weight parallelism. The research evaluates metrics like latency, energy consumption, memory usage, and performance to determine the most efficient approach. The findings showcase how leveraging existing trade-offs can make CGRAs viable options for edge AI accelerators.
Stats
Direct convolution coupled with weight parallelism outperforms CPU implementation by 3.4× in terms of energy efficiency.
Direct convolution coupled with weight parallelism outperforms CPU implementation by 9.9× in terms of latency.
The WP approach reaches an average power of 2.5 mW.
WP approach achieves a peak performance of 0.665 MAC/cycle.
Quotes
"The WP approach is the best performing one, with a peak performance of 0.665 MAC/cycle."
"Direct convolution with weight parallelism outperforms CPU implementation by 3.4× in terms of energy efficiency."