Leveraging Idle Supercomputer Nodes for Scalable Deep Neural Network Training
MalleTrain, a system that enables efficient utilization of idle fragmented nodes on batch-scheduled supercomputer clusters for large-scale deep neural network training, including dynamic workloads such as neural architecture search and hyperparameter optimization.