Efficient Communication-Aware Distributed Training of Low-Rank Neural Networks
A novel data-parallel training method, AB-training, that decomposes weight matrices into low-rank representations and utilizes independent group-based training to significantly reduce network traffic during distributed neural network training.