Core Concepts
Libfork enables fully-portable continuation stealing with stackless coroutines, achieving optimal time/memory scaling.
Abstract
Fully-strict fork-join parallelism is powerful for shared-memory programming.
Implementing continuation-stealing in traditional HPC languages is challenging.
Libfork combines coroutines with segmented-stacks for fine-grained parallelism.
Achieves optimal time/memory scaling compared to openMP and Intel's TBB.
NUMA optimizations enhance performance matching busy-waiting schedulers.
Structured as Introduction, Background, Libfork, and Experimental Evaluation.
Stats
"Compared to openMP (libomp), libfork is on average 7.2× faster and consumes 10× less memory."
"Compared to Intel’s TBB, libfork is on average 2.7× faster and consumes 6.2× less memory."
Quotes
"Libfork enables fully-portable continuation stealing and achieves optimal time/memory scaling."
"NUMA optimizations for schedulers demonstrate performance matching busy-waiting schedulers."