แนวคิดหลัก
Hector is a novel two-level intermediate representation and code generation framework that systematically addresses the performance and programming challenges of implementing relational graph neural networks (RGNNs) on GPU architectures.
บทคัดย่อ
The paper proposes Hector, a novel two-level intermediate representation (IR) and code generation framework, to address the performance and programming challenges of implementing relational graph neural networks (RGNNs) on GPU architectures.
Key highlights:
- The higher-level inter-operator level IR captures the key properties of RGNN models and opportunities to reduce memory accesses in inter-operator scheduling and materialization.
- The lower-level intra-operator level IR provides the facility to express template specialization and lower them to CUDA kernels, decoupling model semantics, data layout, and operator-specific optimizations.
- Hector generates code with flexible data access schemes to eliminate redundant data copies and avoids the need for temporary weight tensors.
- Hector achieves up to 9.9x speed-up in inference and up to 43.7x speed-up in training compared to state-of-the-art systems on select RGNN models and datasets.
- Hector further optimizes performance through compact tensor materialization and linear operator reordering, obtaining up to 3.8x additional speed-up.
- The two-level IR design enables Hector to express model semantics, data layout, and operator-specific schedules in a decoupled manner, reducing programming effort.
สถิติ
The paper reports the following key metrics:
Up to 9.9x speed-up in inference compared to state-of-the-art systems.
Up to 43.7x speed-up in training compared to state-of-the-art systems.
Up to 3.8x additional speed-up from compact tensor materialization and linear operator reordering.
คำพูด
"Hector achieves up to 9.9× speed-up in inference and up to 43.7× speed-up in training compared to the best among the state-of-the-art systems [9, 35, 36] when running RGCN, RGAT, and HGT [2, 13, 31] on heterogeneous datasets provided by DGL and Open Graph Benchmark (OGB) packages [1, 4–6, 11, 32]."
"Hector further optimizes performance through compact tensor materialization and linear operator reordering, obtaining up to 3.8× additional speed-up in inference and 2.7× speed-up in training compared to our basic generated code."