On the diffusion approximation of nonconvex stochastic gradient descent

Type: Article

Publication Date: 2019-01-01

Citations: 61

DOI: https://doi.org/10.4310/amsa.2019.v4.n1.a1

Abstract

We study the Stochastic Gradient Descent (SGD) method in nonconvex optimization problems from the point of view of approximating diffusion processes. We prove rigorously that the diffusion process can approximate the SGD algorithm weakly using the weak form of master equation for probability evolution. In the small step size regime and the presence of omnidirectional noise, our weak approximating diffusion process suggests the following dynamics for the SGD iteration starting from a local minimizer (resp.~saddle point): it escapes in a number of iterations exponentially (resp.~almost linearly) dependent on the inverse stepsize. The results are obtained using the theory for random perturbations of dynamical systems (theory of large deviations for local minimizers and theory of exiting for unstable stationary points). In addition, we discuss the effects of batch size for the deep neural networks, and we find that small batch size is helpful for SGD algorithms to escape unstable stationary points and sharp minimizers. Our theory indicates that one should increase the batch size at later stage for the SGD to be trapped in flat minimizers for better generalization.

Locations

  • Annals of Mathematical Sciences and Applications - View
  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ On the diffusion approximation of nonconvex stochastic gradient descent 2017 Wenqing Hu
Chris Junchi Li
Lei Li
Jianguo Liu
+ Batch Size Matters: A Diffusion Approximation Framework on Nonconvex Stochastic Gradient Descent. 2017 Chris Junchi Li
Lei Li
Junyang Qian
Jianguo Liu
+ A Diffusion Approximation Theory of Momentum Stochastic Gradient Descent in Nonconvex Optimization 2021 Tianyi Liu
Zhehui Chen
Enlu Zhou
Tuo Zhao
+ A Diffusion Approximation Theory of Momentum SGD in Nonconvex Optimization 2018 Tianyi Liu
Zhehui Chen
Enlu Zhou
Tuo Zhao
+ Toward Deeper Understanding of Nonconvex Stochastic Optimization with Momentum using Diffusion Approximations. 2018 Tianyi Liu
Zhehui Chen
Enlu Zhou
Tuo Zhao
+ Online ICA: Understanding Global Dynamics of Nonconvex Optimization via Diffusion Processes 2018 Chris Junchi Li
Zhaoran Wang
Han Liu
+ Online ICA: Understanding Global Dynamics of Nonconvex Optimization via Diffusion Processes 2018 Chris Junchi Li
Zhaoran Wang
Han Liu
+ The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects 2018 Zhanxing Zhu
Jingfeng Wu
Bing Yu
Lei Wu
Jinwen Ma
+ The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects. 2019 Zhanxing Zhu
Jingfeng Wu
Bing Yu
Lei Wu
Jinwen Ma
+ A Diffusion Theory for Deep Learning Dynamics: Stochastic Gradient Descent Escapes From Sharp Minima Exponentially Fast. 2020 Zeke Xie
Issei Sato
Masashi Sugiyama
+ A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima 2020 Zeke Xie
Issei Sato
Masashi Sugiyama
+ A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima 2021 Zeke Xie
Issei Sato
Masashi Sugiyama
+ A Diffusion Theory For Minima Selection: Stochastic Gradient Descent Exponentially Favors Flat Minima 2020 Zeke Xie
Issei Sato
Masashi Sugiyama
+ Dynamic of Stochastic Gradient Descent with State-Dependent Noise 2020 Qi Meng
Shiqi Gong
Wei Chen
Zhi-Ming Ma
Tie‐Yan Liu
+ The Regularization Effects of Anisotropic Noise in Stochastic Gradient Descent 2018 Zhanxing Zhu
Jingfeng Wu
Bing Yu
Lei Wu
Jinwen Ma
+ On the Global Convergence of Continuous-Time Stochastic Heavy-Ball Method for Nonconvex Optimization 2017 Wenqing Hu
Chris Junchi Li
Xiang Zhou
+ PDF Chat Stochastic Gradient Langevin Dynamics with Variance Reduction 2021 Zhishen Huang
Stephen Becker
+ Stochastic Gradient Langevin Dynamics with Variance Reduction 2021 Zhishen Huang
Stephen Becker
+ Uniform-in-Time Weak Error Analysis for Stochastic Gradient Descent Algorithms via Diffusion Approximation 2019 Yuanyuan Feng
Tingran Gao
Lei Li
Jian‐Guo Liu
Yulong Lu
+ PDF Chat Uniform-in-time weak error analysis for stochastic gradient descent algorithms via diffusion approximation 2020 Yuanyuan Feng
Tingran Gao
Lei Li
Jian‐Guo Liu
Yulong Lu

Works Cited by This (0)

Action Title Year Authors