핵심 개념
THC introduces a novel bi-directional compression framework to accelerate distributed deep learning by eliminating computational overheads and improving accuracy.
초록
Deep neural networks (DNNs) require distributed training on larger clusters as models and datasets grow.
Compression schemes like THC reduce communication overhead and accelerate training.
THC enables direct aggregation of compressed values, eliminating computational overhead.
THC is compatible with in-network aggregation for further acceleration.
Evaluation shows THC achieves target accuracy faster compared to state-of-the-art systems.
THC simplifies the Parameter Server (PS) architecture and improves training throughput.
THC can handle packet loss and stragglers, ensuring model accuracy and convergence.
Implementation includes GPU-based compression, RHT pre-processing, and optimal lookup table construction.
Evaluation on computer vision and language models demonstrates the effectiveness of THC.
통계
"Our evaluation shows that training representative vision and language models with THC reaches target accuracy by 1.40× to 1.47× faster using INA and 1.28× to 1.33× faster using a software PS compared with state-of-the-art systems."
"THC with the programmable switch also improves the training throughput by up to 54% over Horovod RDMA."
인용구
"THC introduces Tensor Homomorphic Compression, a novel bi-directional compression framework that enables the direct aggregation of compressed values."
"THC achieves the target accuracy faster using INA and a software PS compared to state-of-the-art systems."