Ask a Question

Prefer a chat interface with context about you and your work?

Asynchrony begets momentum, with an application to deep learning

Asynchrony begets momentum, with an application to deep learning

Asynchronous methods are widely used in deep learning, but have limited theoretical justification when applied to non-convex problems. We show that running stochastic gradient descent (SGD) in an asynchronous manner can be viewed as adding a momentum-like term to the SGD iteration. Our result does not assume convexity of the …