Non-asymptotic estimates for TUSLA algorithm for non-convex learning with applications to neural networks with ReLU activation function

Type: Article

Publication Date: 2023-06-10

Citations: 1

DOI: https://doi.org/10.1093/imanum/drad038

Abstract

Abstract We consider nonconvex stochastic optimization problems where the objective functions have super-linearly growing and discontinuous stochastic gradients. In such a setting, we provide a nonasymptotic analysis for the tamed unadjusted stochastic Langevin algorithm (TUSLA) introduced in Lovas et al. (2020). In particular, we establish nonasymptotic error bounds for the TUSLA algorithm in Wasserstein-1 and Wasserstein-2 distances. The latter result enables us to further derive nonasymptotic estimates for the expected excess risk. To illustrate the applicability of the main results, we consider an example from transfer learning with ReLU neural networks, which represents a key paradigm in machine learning. Numerical experiments are presented for the aforementioned example, which support our theoretical findings. Hence, in this setting, we demonstrate both theoretically and numerically that the TUSLA algorithm can solve the optimization problem involving neural networks with ReLU activation function. Besides, we provide simulation results for synthetic examples where popular algorithms, e.g., ADAM, AMSGrad, RMSProp and (vanilla) stochastic gradient descent algorithm, may fail to find the minimizer of the objective functions due to the super-linear growth and the discontinuity of the corresponding stochastic gradient, while the TUSLA algorithm converges rapidly to the optimal solution. Moreover, we provide an empirical comparison of the performance of TUSLA with popular stochastic optimizers on real-world datasets, as well as investigate the effect of the key hyperparameters of TUSLA on its performance.

Locations

  • IMA Journal of Numerical Analysis - View
  • arXiv (Cornell University) - View - PDF
  • Edinburgh Research Explorer (University of Edinburgh) - View - PDF

Similar Works

Action Title Year Authors
+ Non-asymptotic estimates for TUSLA algorithm for non-convex learning with applications to neural networks with ReLU activation function 2021 Dong‐Young Lim
Ariel Neufeld
Sotirios Sabanis
Ying Zhang
+ Langevin dynamics based algorithm e-TH$\varepsilon$O POULA for stochastic optimization problems with discontinuous stochastic gradient 2022 Dong‐Young Lim
Ariel Neufeld
Sotirios Sabanis
Ying Zhang
+ Taming Neural Networks with TUSLA: Nonconvex Learning via Adaptive Stochastic Gradient Langevin Algorithms 2023 Attila Lovas
Iosif Lytras
Miklós Rásonyi
Sotirios Sabanis
+ Taming neural networks with TUSLA: Non-convex learning via adaptive stochastic gradient Langevin algorithms 2020 Attila Lovas
Iosif Lytras
Miklós Rásonyi
Sotirios Sabanis
+ A proof of convergence for the gradient descent optimization method with random initializations in the training of neural networks with ReLU activation for piecewise linear target functions. 2021 Arnulf Jentzen
Adrian Riekert
+ PDF Chat Convergence of continuous-time stochastic gradient descent with applications to linear deep neural networks 2024 Gábor Lugosi
Eulàlia Nualart
+ PDF Chat Correction to: analysis of stochastic gradient descent in continuous time 2024 Jonas Latz
+ Adaptive Learning Rates for Faster Stochastic Gradient Methods 2022 Samuel Horváth
Konstantin Mishchenko
Peter Richtárik
+ A proof of convergence for the gradient descent optimization method with random initializations in the training of neural networks with ReLU activation for piecewise linear target functions 2021 Arnulf Jentzen
Adrian Riekert
+ PDF Chat Online Non-Stationary Stochastic Quasar-Convex Optimization 2024 Yuen-Man Pun
Iman Shames
+ A new non-convex framework to improve asymptotical knowledge on generic stochastic gradient descent 2023 Jean-Baptiste Fest
Audrey Repetti
Émilie Chouzenoux
+ PDF Chat Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions 2021 Martin Hutzenthaler
Arnulf Jentzen
Katharina Pohl
Adrian Riekert
Luca Scarpa
+ Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions 2021 Martin Hutzenthaler
Arnulf Jentzen
Katharina Pohl
Adrian Riekert
Luca Scarpa
+ PDF Chat Non-asymptotic convergence analysis of the stochastic gradient Hamiltonian Monte Carlo algorithm with discontinuous stochastic gradient with applications to training of ReLU neural networks 2024 Lei Liang
Ariel Neufeld
Ying Zhang
+ A New Non-Convex Framework to Improve Asymptotical Knowledge on Generic Stochastic Gradient Descent 2023 Jean-Baptiste Fest
Audrey Repetti
Émilie Chouzenoux
+ Convergence Analysis of Two-layer Neural Networks with ReLU Activation 2017 Yuanzhi Li
Yuan Yang
+ A Unified Analysis of Stochastic Momentum Methods for Deep Learning 2018 Yan Yan
Tianbao Yang
Zhe Li
Qihang Lin
Yi Yang
+ Error Analysis for Empirical Risk Minimization Over Clipped ReLU Networks in Solving Linear Kolmogorov Partial Differential Equations 2024 Jichang Xiao null
Xiaoqun Wang
+ A Unified Analysis of Stochastic Momentum Methods for Deep Learning 2018 Yan Yan
Tianbao Yang
Zhe Li
Qihang Lin
Yi Yang
+ Appropriate Learning Rates of Adaptive Learning Rate Optimization Algorithms for Training Deep Neural Networks. 2020 Hideaki Iiduka

Works That Cite This (0)

Action Title Year Authors