Ask a Question

Prefer a chat interface with context about you and your work?

A Distributed Synchronous SGD Algorithm with Global Top-k Sparsification for Low Bandwidth Networks

A Distributed Synchronous SGD Algorithm with Global Top-k Sparsification for Low Bandwidth Networks

Distributed synchronous stochastic gradient descent (S-SGD) with data parallelism has been widely used in training large-scale deep neural networks (DNNs), but it typically requires very high communication bandwidth between computational workers (e.g., GPUs) to exchange gradients iteratively. Recently, Top-k sparsification techniques have been proposed to reduce the volume of data …