The paper explores the performance of fine-grained task parallelism on simultaneous multithreading (SMT) CPU cores. It first conducts a performance analysis of seven state-of-the-art shared-memory parallel programming frameworks, including OpenMP, Intel oneAPI Thread Building Blocks, OpenCilk, and Taskflow, using real-world fine-grained application kernels such as graph algorithms and JSON parsing. The results show performance degradations on several fine-grained tasks with the existing frameworks.
To address this, the paper introduces Relic, a specialized parallel programming framework designed to enable extremely fine-grained task parallelism on SMT cores. Relic utilizes a simple single-producer single-consumer task scheduling mechanism and optimized waiting/suspension mechanisms to reduce task handling overheads. Evaluation results demonstrate that Relic achieves significant performance improvements over the state-of-the-art frameworks, with speedups ranging from 19.1% to 33.2% across the investigated benchmarks.
The key insights are:
In eine andere Sprache
aus dem Quellinhalt
arxiv.org
Wichtige Erkenntnisse aus
by Denis Los, I... um arxiv.org 10-03-2024
https://arxiv.org/pdf/2410.01222.pdfTiefere Fragen