Computer Architecture

Giriş Yap

içgörü - Computer Architecture

應用程序特定諧振 SRAM 計算內存 (rCiM) 的架構探索

本文提出了一種基於諧振 SRAM 的新型計算內存 (rCiM) 架構，並開發了一個自動化工具來探索應用程序特定的 rCiM 設計，旨在最大程度地降低能耗和延遲。

FPGA-based Distributed Union-Find Decoder for Scalable Surface Code Error Correction

A distributed Union-Find decoder that can exploit parallel computing resources to achieve sublinear average time complexity with respect to the surface code distance, enabling faster error correction for large surface codes.

ARM 프로세서의 다단계 메모리 중심 프로파일링: ARM SPE 활용

ARM 프로세서의 다단계 메모리 프로파일링 도구를 설계하고, ARM SPE를 활용하여 메모리 접근 패턴을 분석하고 정량적으로 평가한다.

Quantitative Evaluation of ARM Statistical Profiling Extension for Memory-Centric Performance Analysis

This work presents a multi-level memory-centric profiling tool called NMO that leverages ARM's Statistical Profiling Extension (SPE) to enable precise memory access tracing on ARM processors. It provides the first quantitative assessment of time overhead and sampling accuracy of ARM SPE for memory-centric profiling at different sampling periods and aux buffer sizes.

CVA6 프로세서의 성능 모델을 활용한 수퍼스칼라 구현

성능 모델을 활용하여 CVA6 프로세서의 수퍼스칼라 기능을 구현하고, 이를 통해 40%의 성능 향상을 달성하였다.

Implementing a High-Performance Superscalar CVA6 RISC-V Processor Using a Cycle-Accurate Performance Model

A cycle-accurate performance model was developed to guide the implementation of a superscalar version of the open-source CVA6 RISC-V processor, resulting in a 40% performance improvement on the CoreMark benchmark.

Enabling Extremely Fine-grained Task Parallelism on Simultaneous Multithreading CPU Cores

Specialized software-only framework Relic enables significant performance improvements over state-of-the-art parallel programming frameworks for fine-grained tasks on simultaneous multithreading CPU cores.

Scalable Look-Up Table based Neural Accelerator with Mixed Precision Analysis for Energy-Efficient Inference

A scalable and programmable Look-Up Table (LUT) based Neural Accelerator (LUT-NA) framework that employs a divide-and-conquer approach to overcome the scalability limitations of traditional LUT-based techniques, and utilizes mixed-precision analysis to further reduce energy and area consumption without significant accuracy loss.

A Reconfigurable RISC-V Processor Platform with Configurable Accuracy for Fault-Tolerant Applications

The proposed platform enables the integration of approximate circuits at the core level with diverse structures, accuracies, and timings without requiring modifications to the core, particularly in the control logic. It introduces novel control features, allowing configurable trade-offs between accuracy and energy consumption based on specific application requirements.

Efficient FPGA Accelerator for Lightweight Convolutional Neural Networks with Balanced Dataflow

A novel streaming architecture with hybrid computing engines and a balanced dataflow strategy is proposed to efficiently accelerate lightweight convolutional neural networks by minimizing on-chip memory overhead and off-chip memory access while enhancing computational efficiency.

1
2
3
4
5
•••
10

Hakkında

Ürünler

Kaynaklar