Khái niệm cốt lõi
Hector is a novel two-level intermediate representation and code generation framework that systematically addresses the performance and programming challenges of implementing relational graph neural networks (RGNNs) on GPU architectures.
Tóm tắt
The paper proposes Hector, a novel two-level intermediate representation (IR) and code generation framework, to address the performance and programming challenges of implementing relational graph neural networks (RGNNs) on GPU architectures.
Key highlights:
- The higher-level inter-operator level IR captures the key properties of RGNN models and opportunities to reduce memory accesses in inter-operator scheduling and materialization.
- The lower-level intra-operator level IR provides the facility to express template specialization and lower them to CUDA kernels, decoupling model semantics, data layout, and operator-specific optimizations.
- Hector generates code with flexible data access schemes to eliminate redundant data copies and avoids the need for temporary weight tensors.
- Hector achieves up to 9.9x speed-up in inference and up to 43.7x speed-up in training compared to state-of-the-art systems on select RGNN models and datasets.
- Hector further optimizes performance through compact tensor materialization and linear operator reordering, obtaining up to 3.8x additional speed-up.
- The two-level IR design enables Hector to express model semantics, data layout, and operator-specific schedules in a decoupled manner, reducing programming effort.
Thống kê
The paper reports the following key metrics:
Up to 9.9x speed-up in inference compared to state-of-the-art systems.
Up to 43.7x speed-up in training compared to state-of-the-art systems.
Up to 3.8x additional speed-up from compact tensor materialization and linear operator reordering.
Trích dẫn
"Hector achieves up to 9.9× speed-up in inference and up to 43.7× speed-up in training compared to the best among the state-of-the-art systems [9, 35, 36] when running RGCN, RGAT, and HGT [2, 13, 31] on heterogeneous datasets provided by DGL and Open Graph Benchmark (OGB) packages [1, 4–6, 11, 32]."
"Hector further optimizes performance through compact tensor materialization and linear operator reordering, obtaining up to 3.8× additional speed-up in inference and 2.7× speed-up in training compared to our basic generated code."