Optimizing Hardware Accelerator Architectures for Distributed Deep Learning Training
WHAM proposes a novel critical-path-based heuristic approach to efficiently search for optimal hardware accelerator architectures that maximize training throughput and energy efficiency for both single-device and distributed deep learning training scenarios.