Trust Region Methods for Nonconvex Stochastic Optimization beyond Lipschitz Smoothness
Trust Region Methods for Nonconvex Stochastic Optimization beyond Lipschitz Smoothness
In many important machine learning applications, the standard assumption of having a globally Lipschitz continuous gradient may fail to hold. This paper delves into a more general (L0, L1)-smoothness setting, which gains particular significance within the realms of deep neural networks and distributionally robust optimization (DRO). We demonstrate the significant …