Type: Article
Publication Date: 2021-01-27
Citations: 68
DOI: https://doi.org/10.1093/imaiai/iaab002
Abstract We consider a model for logistic regression where only a subset of features of size $p$ is used for training a linear classifier over $n$ training samples. The classifier is obtained by running gradient descent on logistic loss. For this model, we investigate the dependence of the classification error on the ratio $\kappa =p/n$. First, building on known deterministic results on the implicit bias of gradient descent, we uncover a phase-transition phenomenon for the case of Gaussian features: the classification error of the gradient descent solution is the same as that of the maximum-likelihood solution when $\kappa <\kappa _\star $, and that of the support vector machine when $\kappa>\kappa _\star $, where $\kappa _\star $ is a phase-transition threshold. Next, using the convex Gaussian min–max theorem, we sharply characterize the performance of both the maximum-likelihood and the support vector machine solutions. Combining these results, we obtain curves that explicitly characterize the classification error for varying values of $\kappa $. The numerical results validate the theoretical predictions and unveil double-descent phenomena that complement similar recent findings in linear regression settings as well as empirical observations in more complex learning scenarios.