AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training
AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training
Highly distributed training of Deep Neural Networks (DNNs) on future compute platforms (offering 100 of TeraOps/s of computational capacity) is expected to be severely communication constrained. To overcome this limitation, new gradient compression techniques are needed that are computationally friendly, applicable to a wide variety of layers seen in Deep …