แนวคิดหลัก
Containerization can improve reproducibility of HPC applications, but may introduce performance overheads. This study evaluates the performance impact of running the HPX/Kokkos-based astrophysics application Octo-Tiger in Singularity containers on a homogeneous CPU-based supercomputer (Fugaku) and a heterogeneous CPU-GPU cluster (DeepBayou).
บทคัดย่อ
The paper investigates the use of containers, specifically Singularity, for running the HPX/Kokkos-based astrophysics application Octo-Tiger on HPC resources. It covers the following key points:
Motivation for using containers in HPC:
Improved reproducibility by packaging the application and dependencies
Easier deployment on different HPC systems
Software stack for Octo-Tiger:
Octo-Tiger is built on top of the HPX asynchronous many-task runtime system and uses the Kokkos performance-portable programming model.
It also relies on several other dependencies like HDF5, Boost, CUDA, etc.
Workflow for building and running Octo-Tiger in containers:
The authors use Spack to manage the dependencies and generate a Docker file.
They then convert the Docker image to a Singularity container, which can be executed on HPC clusters without requiring root access.
Challenges encountered include handling vendor-specific compilers and libraries on the Fugaku supercomputer.
Performance evaluation:
On the Fugaku supercomputer (homogeneous CPU-based), the regular (non-containerized) runs were about 50 seconds faster on average compared to the Singularity container runs.
On the DeepBayou hybrid CPU-GPU cluster, the CPU-only runs showed comparable performance between the regular and containerized runs. However, the distributed GPU runs in the container crashed with CUDA errors.
The paper concludes that containers offer benefits for reproducibility and portability, but building the containers can be more challenging than a native installation. The performance impact varies depending on the target HPC system, with the homogeneous CPU-based Fugaku showing more overhead compared to the hybrid CPU-GPU DeepBayou cluster.
สถิติ
The paper provides the following performance statistics:
On Supercomputer Fugaku:
Minimum time for regular runs: 214.1 seconds
Median time for regular runs: 215.2 seconds
Average time for regular runs: 221.1 seconds
Maximum time for regular runs: 237.6 seconds
Standard deviation for regular runs: 9.8 seconds
Minimum time for Singularity runs: 267.5 seconds
Median time for Singularity runs: 277.3 seconds
Average time for Singularity runs: 277.1 seconds
Maximum time for Singularity runs: 286.8 seconds
Standard deviation for Singularity runs: 6.2 seconds