핵심 개념
A high-performance HGNN accelerator, HiHGNN, is proposed to alleviate execution bottlenecks and exploit the high-degree parallelism and data reusability in HGNNs.
초록
The content discusses the design of HiHGNN, a high-performance accelerator for Heterogeneous Graph Neural Networks (HGNNs).
Key highlights:
-
Characterization of HGNN models on GPU reveals that different stages exhibit diverse execution bounds, leading to unbalanced utilization across hardware components.
-
Proposed a bound-aware stage-fusion methodology, including a novel programming model and hardware datapath, to fuse and pipeline the execution of stages with different bounds.
-
Designed an independency-aware parallel execution to exploit the high-degree inter-semantic-graph parallelism, involving scale-up optimization and workload-aware scheduling.
-
Proposed a similarity-aware execution scheduling to maximize the reuse of intermediate results across the processing of semantic graphs.
-
Compared to state-of-the-art software frameworks on GPU, HiHGNN achieves an average 40.0× and 8.3× speedup as well as 99.59% and 99.74% energy reduction, respectively.
통계
The CUDA kernels in the FP stage generally exhibit compute bound, achieving over 95% peak performance.
The SpMMCsr kernel in the NA stage exhibits high DRAM bandwidth utilization (74.3%) with a low L2 Cache hit rate (31.4%) due to irregular memory accesses.
The sgemm kernel in the SF stage still exhibits compute-bound as the high peak performance (84.2%), while the uEleWise, Reduce, and Concat kernels show memory bound, achieving over 80% DRAM bandwidth utilization.
인용구
"HGNNs have achieved excellent prediction accuracy in the processing of HetG and become at the heart of a broad range of critical fields [10], [61], [70], [78] such as recommendation systems [11], [29], medical analysis [39], knowledge inference [3], [55], [60], malicious account detection [38], information retrieval [41], shop search [42], etc."
"To capture both the structural information and semantic information in HetGs, most prevalent HGNN models usually contain four major execution stages as shown in Fig. 1."