The Sample Complexity of Gradient Descent in Stochastic Convex
Optimization
The Sample Complexity of Gradient Descent in Stochastic Convex
Optimization
We analyze the sample complexity of full-batch Gradient Descent (GD) in the setup of non-smooth Stochastic Convex Optimization. We show that the generalization error of GD, with common choice of hyper-parameters, can be $\tilde \Theta(d/m + 1/\sqrt{m})$, where $d$ is the dimension and $m$ is the sample size. This matches …