Core Concepts
CodeFlow, a unified programming model that leverages CXL and WASI to simplify heterogeneous programming by allowing developers to write multithreaded code in a single language, without the need to explicitly manage different accelerators.
Abstract
The paper presents CodeFlow, a unified programming model for heterogeneous computing systems. Heterogeneous systems integrate multiple types of specialized computing and memory devices to deliver higher performance, but programming such systems remains a critical limiting factor.
The key insights are:
Heterogeneous programming is complex due to the lack of cache coherence and the diversity of architectures, requiring specialized code and libraries for each accelerator.
The emergence of Compute Express Link (CXL), a cache-coherent interconnection protocol, and WebAssembly System Interface (WASI), a portable binary format, provide an opportunity to standardize heterogeneous programming.
CodeFlow leverages these technologies to enable a unified programming model:
Developers write multithreaded code in a high-level language (e.g., C++, Rust), which is compiled to a WASI binary.
The CodeFlow runtime system schedules the threads to run on suitable accelerators, handles memory sharing through CXL, and performs just-in-time compilation for the target architectures.
This approach allows developers to focus on high-level logic without worrying about device-specific implementations, simplifying the adoption of heterogeneous systems.
The paper also presents an evaluation of the CodeFlow prototype, demonstrating its performance characteristics and the potential benefits of the unified programming model.
Stats
The latency and bandwidth of CXL memory compared to CPU system memory:
Local DDR5 latency: 108.2 ns, bandwidth: 105.0 GiB/s
Remote DDR5 latency: 171.5 ns, bandwidth: 59.1 GiB/s
Local CXL latency: 371.2 ns, bandwidth: 17.4 GiB/s
Remote CXL latency: 538.0 ns, bandwidth: 9.0 GiB/s
Quotes
"CodeFlow abstracts architecture computation in programming language runtime and utilizes CXL as a unified data exchange protocol."
"Workloads written in high-level languages such as C++ and Rust can be compiled to CodeFlow, which schedules different parts of the workload to suitable accelerators without requiring the developer to implement code or call APIs for specific accelerators."