toplogo
Sign In

Automatic Synthesis of PIM-Based CNN Accelerators


Core Concepts
The author presents the PIMSYN framework for automatic synthesis of PIM-based CNN accelerators, aiming to optimize power efficiency and performance through architectural exploration.
Abstract
The paper introduces PIMSYN, an automatic framework for synthesizing Processing-in-memory (PIM) based CNN accelerators. It addresses the challenges of manual design by providing a comprehensive design space exploration and optimization process. The framework significantly improves power efficiency and throughput compared to existing works by integrating weight duplication, dataflow compilation, macro partitioning, and components allocation stages.
Stats
Existing PIM accelerator designs rely heavily on expert experience. PIMSYN improves power efficiency by several times compared to existing works. ADCs and DACs in PIM accelerators consume over 60% of total power. The scale of the defined design space can reach up to 10^27 for VGG13. SA-based filter selects weight duplication candidates with the lowest energy-function values.
Quotes
"The architecture abstraction is compatible with most previous works." "PIMSYN integrates an architectural exploration flow in the synthesis framework." "Existing PIM-based CNN accelerators are mostly manually designed."

Key Insights Distilled From

by Wanqian Li,X... at arxiv.org 02-29-2024

https://arxiv.org/pdf/2402.18114.pdf
PIMSYN

Deeper Inquiries

How does the utilization of ReRAM devices impact the overall power efficiency of PIM accelerators

The utilization of ReRAM devices plays a crucial role in determining the overall power efficiency of Processing-in-Memory (PIM) accelerators. ReRAM devices are leveraged in PIM architectures to store weights and perform analog matrix-vector multiplication within crossbars, enabling in-situ computation without the need for frequent data transfers between storage and processing units. This design significantly reduces data movement, which is a major source of power consumption in traditional architectures. By integrating computation and memory functions into a single unit, ReRAM-based PIM accelerators can achieve substantial power savings compared to conventional systems that rely on separate processing and memory components.

What are the implications of inter-layer pipelining in PIM-based CNN accelerator design

Inter-layer pipelining introduces significant implications in the design of PIM-based CNN accelerators. Unlike conventional CMOS-based accelerators that execute layers sequentially with data transfers between each layer, PIM accelerators implement inter-layer pipelining by storing all CNN weights within ReRAM cells and facilitating parallel execution across multiple layers simultaneously. This approach eliminates the need for transferring both weights and activations for every layer, leading to improved performance efficiency by reducing latency associated with inter-layer communication. Moreover, inter-layer pipelining involves weight duplication strategies where a layer's weights are duplicated for multiple copies to enhance computation throughput while maintaining high parallelism during inference tasks. However, this weight duplication technique increases the complexity of hardware architecture design as it expands the design space exponentially due to variations in hardware configurations based on different weight duplication factors.

How can automated synthesis frameworks like PIMSYN contribute to advancements in neural network acceleration beyond traditional methods

Automated synthesis frameworks like PIMSYN offer significant contributions to advancements in neural network acceleration beyond traditional methods by streamlining the process of designing energy-efficient Processing-in-Memory (PIM) based CNN accelerators. These frameworks leverage automated algorithms such as simulated annealing (SA) and evolutionary algorithms (EA) to explore vast design spaces efficiently, optimizing architecture implementations for maximum power efficiency. By automating tasks such as weight duplication determination, dataflow compilation mapping CNN structures into intermediate representations), macro partitioning distributing computational tasks among macros), and component allocation assigning functional components within macros), these frameworks enable rapid generation of optimized accelerator designs tailored specifically for CNN applications. Additionally, they facilitate comprehensive exploration of architectural parameters leading to enhanced performance metrics such as power efficiency improvements up 21 times compared with manual designs. Overall, automated synthesis frameworks play a pivotal role in accelerating innovation by enabling researchers and developers to focus on higher-level optimizations rather than getting bogged down by intricate manual design processes—ultimately driving advancements in neural network acceleration towards more efficient and scalable solutions.
0
visual_icon
generate_icon
translate_icon
scholar_search_icon
star