Ask a Question

Prefer a chat interface with context about you and your work?

Entropy-SGD: biasing gradient descent into wide valleys

Entropy-SGD: biasing gradient descent into wide valleys

Abstract This paper proposes a new optimization algorithm called Entropy-SGD for training deep neural networks that is motivated by the local geometry of the energy landscape. Local extrema with low generalization error have a large proportion of almost-zero eigenvalues in the Hessian with very few positive or negative eigenvalues. We …