Khái niệm cốt lõi
Developing specialized hardware accelerators to efficiently process and analyze graph-structured data using Graph Neural Networks (GNNs).
Tóm tắt
This dissertation focuses on enabling accelerators for efficient graph computing, with a particular emphasis on Graph Neural Networks (GNNs). The key insights and contributions are as follows:
GNN Workload Characterization:
Developed a comprehensive benchmark suite, GNNMark, to analyze the architectural requirements and performance characteristics of various GNN workloads.
Profiled GNN computations to identify key bottlenecks, such as irregular memory access patterns, computational irregularity, and sparsity challenges.
Provided detailed insights into the execution time breakdown, instruction mix, cache behavior, and scalability of GNN training on multi-GPU systems.
SMASH: GNN Algorithmic Acceleration:
Proposed SMASH, a novel sparse matrix-matrix multiplication (SpGEMM) kernel optimized for a custom accelerator.
Introduced the MAP-CSR storage format and various versions of the SMASH algorithm, including atomic hashing, tokenization, and pipelining, to address the challenges of SpGEMM computations.
Demonstrated significant performance improvements over state-of-the-art SpGEMM implementations.
NeuraChip: GNN Hardware Acceleration:
Designed NeuraChip, a custom Coarse-Grained Reconfigurable Array (CGRA) accelerator architecture tailored for GNN workloads.
Incorporated key architectural features, such as tiled Gustavson's algorithm, on-chip dataflow, and dynamically reseeding hash-based mapping, to address the unique computational and memory requirements of GNNs.
Evaluated NeuraChip's performance against CPU, GPU, and state-of-the-art sparse matrix accelerators, showcasing significant speedups across a range of GNN workloads.
The comprehensive approach of this dissertation, encompassing workload characterization, algorithmic optimizations, and custom hardware design, provides a holistic solution to enable efficient acceleration of GNN computations.
Thống kê
"Graph computations inherently possess a rich vein of task-level parallelism, given that many operations can theoretically be conducted simultaneously across different nodes or subgraphs."
"The lack of data spatial locality implies that successive operations might access data dispersed across memory, leading to increased cache misses and degraded memory performance."
"Computational irregularity in graph algorithms surfaces due to the diverse node degrees and edge distributions, causing workload imbalance in parallel computing scenarios."
Trích dẫn
"Graphs have seen a growing role in modern computational domains. With the rise of vast amounts of complex, interconnected data, traditional data processing methods have often fallen short."
"As promising as GNNs are, they are not without their computational challenges. Given the inherently recursive nature of GNNs, coupled with the irregular structure of many real-world graphs, we find significant bottlenecks in terms of their scalability and performance."