Emerging multi-modal generative AI models, such as text-to-image (TTI) and text-to-video (TTV) models, exhibit unique system characteristics that require tailored optimizations beyond those designed for traditional large language models (LLMs). Careful characterization of these workloads is critical to enable efficient deployment at scale.
LearnedFTL employs learned indexes to improve the address translation efficiency of flash-based SSDs, reducing the number of double reads induced by address translation in random read accesses.
The authors extend the Reframe testing framework to support Kubernetes as a backend scheduler, and utilize this framework to benchmark the performance of various machine learning applications, including ResNet-50, DeepCAM, and CosmoFlow, across a range of heterogeneous hardware platforms managed by EPCC.
A deep reinforcement learning-based online scheduling algorithm, RELMAS, that efficiently schedules deep neural network inference requests on a multi-tenant multi-accelerator system, optimizing service-level agreement satisfaction rates while considering heterogeneous accelerator capabilities and memory bandwidth constraints.