LFRic 기상 및 기후 모델의 동적 코어 GungHo가 다양한 슈퍼컴퓨터 시스템에서 우수한 확장성을 보이며, I/O 성능 또한 구성 변경을 통해 크게 향상될 수 있음을 보여줌.
The LFRic weather and climate model, written in a domain-specific language and using a domain-specific compiler, demonstrates good scaling behavior up to large node counts on different generations of HPE Cray EX supercomputers. The performance analysis reveals the impact of algorithm choices, such as redundant computation, and the scaling behavior with OpenMP threads. The I/O performance of the XIOS server is also analyzed and optimized.
The Cerebras Wafer Scale Engine (WSE) is a powerful AI accelerator that can efficiently train and run inference on large language models (LLMs) like BERT and GPT-3 by leveraging its high memory bandwidth, abundant compute resources, and low-overhead communication between cores.
ProTEA is a runtime programmable FPGA accelerator designed to efficiently execute the computationally intensive multi-head attention and feedforward neural network layers of transformer encoder models.
ytopt와 libEnsemble을 통합한 새로운 자동 튜닝 프레임워크 ytopt-libe를 사용하여 OpenMC의 성능, 에너지 소비 및 에너지 지연 시간 최적화
Integrating the ytopt autotuning framework with libEnsemble to accelerate the autotuning process and improve the accuracy of the surrogate model, enabling efficient exploration of the large parameter space of the ECP application OpenMC to optimize its performance, energy, and energy-delay product.
Integrating projection-based reduced order models (PROMs) with high-performance computing (HPC) is critical for developing efficient and accurate digital twins, particularly for real-time monitoring and predictive maintenance of industrial systems.
다중 소켓 컴퓨터의 프로세스 에너지 소비를 정확하게 추정하기 위해 실행된 명령어 유형을 활용한 새로운 수학적 모델을 제안한다.
Containerization can improve reproducibility of HPC applications, but may introduce performance overheads. This study evaluates the performance impact of running the HPX/Kokkos-based astrophysics application Octo-Tiger in Singularity containers on a homogeneous CPU-based supercomputer (Fugaku) and a heterogeneous CPU-GPU cluster (DeepBayou).
A systematic training method called ScaleFold that incorporates optimizations to address the key factors preventing the AlphaFold training from scaling to more compute resources, enabling it to be completed in 10 hours.