Ask a Question

Prefer a chat interface with context about you and your work?

Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks

Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks

We study the gradient descent (GD) dynamics of a depth-2 linear neural network with a single input and output. We show that GD converges at an explicit linear rate to a global minimum of the training loss, even with a large stepsize -- about $2/\textrm{sharpness}$. It still converges for even …