toplogo
로그인

A High-Throughput SRAM-Based Charge-Domain Computing-in-Memory Macro with Single-ADC Interface and ReLU Optimization


핵심 개념
This work presents a high-throughput SRAM-based charge-domain computing-in-memory (CD-CiM) macro that can complete MAC+ReLU computing of two signed 8-bit vectors in one CiM cycle with only one-time A/D conversion, achieving significant area and energy savings compared to prior works.
초록

The paper presents a novel SRAM-based CD-CiM architecture that addresses the key challenges in achieving high-throughput multi-bit quantization. The key contributions are:

  1. Charge-Domain Analog Adder Tree (CAAT):

    • Uses a hybrid binary-C-2C capacitor network to reduce the total capacitance area compared to prior exponential binary-weighted designs.
    • Enables parallel in-column, in-bank, and in-array summation to complete 8-bit MAC in one CiM cycle.
  2. Single-ADC Interface:

    • Integrates the ReLU function into the ADC design, allowing early-stop-to-zero when the MSB is negative, reducing ADC energy by half.
    • Reduces the ADC overhead by 8x compared to prior serial-activation-input designs that require one ADC per activation bit.
  3. Non-linearity Compensation:

    • Proposes an output-based fine-tune scheme to compensate for the non-linearity in the CAAT and ADC, improving the inference accuracy.

The fabricated 65nm test chip achieves 51.2 GOPS throughput and 10.3 TOPS/W energy efficiency, while maintaining 88.6% accuracy on the CIFAR-10 dataset.

edit_icon

요약 맞춤 설정

edit_icon

AI로 다시 쓰기

edit_icon

인용 생성

translate_icon

소스 번역

visual_icon

마인드맵 생성

visit_icon

소스 방문

통계
The proposed CiM macro achieves 51.2 GOPS throughput at 1.0 GHz clock frequency. At 240 MHz clock frequency, it achieves 10.3 TOPS/W energy efficiency.
인용구
"This work addresses this question and makes contribution shown in Fig. 1(c): a high-throughput SRAM-based ReLU-optimized CD-CiM architecture enabled by a two-level compact charge-domain analog adder tree (CAAT) and ReLU-optimized one-ADC sensing interface for the entire CiM array of all parallel CiM banks." "It could complete MAC+ReLU computing of two signed 8b vectors in one CiM cycle with one-time A/D conversion, leading to 8x ADC-related area and energy savings compared with prior works."

더 깊은 질문

How can the proposed CiM architecture be extended to support even higher bit-width activations and weights, while maintaining the high throughput and energy efficiency

To extend the proposed CiM architecture to support higher bit-width activations and weights while maintaining high throughput and energy efficiency, several strategies can be implemented. One approach is to scale up the existing design by increasing the number of replicated CiM banks and expanding the size of the SRAM CiM array to accommodate larger bit-widths. This would involve replicating the CAAT structure and the ADC interface to handle the increased data precision. Additionally, optimizing the capacitor network design to efficiently sum up the results of higher bit-width computations can help maintain the desired performance metrics. Implementing parallel processing units within the CiM architecture can also enhance throughput for multi-bit operations. Furthermore, exploring advanced fabrication technologies to reduce power consumption and improve speed can contribute to supporting higher bit-width computations effectively.

What are the potential challenges and trade-offs in applying this CiM design to different neural network models and applications beyond image classification

When applying this CiM design to different neural network models and applications beyond image classification, several challenges and trade-offs may arise. One challenge is adapting the architecture to handle diverse data types and operations required by various neural network models, such as recurrent neural networks or transformer models. This may involve reconfiguring the CiM structure to support different types of computations and data formats. Trade-offs may include balancing the area and energy efficiency requirements of the CiM design with the computational demands of complex neural network tasks. Additionally, ensuring compatibility with different training and inference algorithms, as well as addressing the scalability of the architecture for larger and more complex models, are crucial considerations. Furthermore, optimizing the fine-tune compensation scheme for different applications and datasets to maintain high inference accuracy is essential.

What other types of in-memory computing architectures could benefit from the hybrid binary-C-2C capacitor network and the output-based fine-tune compensation scheme proposed in this work

The hybrid binary-C-2C capacitor network and the output-based fine-tune compensation scheme proposed in this work can benefit other types of in-memory computing architectures, such as analog computing and resistive computing. In analog computing, the hybrid capacitor network can improve the accuracy and efficiency of analog summation operations, enabling high-performance analog computing-in-memory applications. The output-based fine-tune compensation scheme can be applied to compensate for non-linearities in analog computing circuits, enhancing the overall precision and reliability of analog computing systems. In resistive computing, the hybrid capacitor network can optimize the analog-to-digital conversion process, improving the accuracy of resistive computing operations. Additionally, the output-based fine-tune scheme can mitigate non-linearities in resistive computing systems, enhancing their performance and robustness. Overall, these techniques can be valuable in advancing various in-memory computing architectures beyond the scope of the proposed CiM design.
0
star