Projects
Reading
People
Chat
SU\G
(𝔸)
/K·U
Projects
Reading
People
Chat
Sign Up
Light
Dark
System
A Bayesian Perspective on Generalization and Stochastic Gradient Descent
Samuel Smith
,
Quoc V. Le
Type:
Preprint
Publication Date:
2017-10-17
Citations:
10
View Publication
Share
Locations
arXiv (Cornell University) -
View
Similar Works
Action
Title
Year
Authors
+
A Bayesian Perspective on Generalization and Stochastic Gradient Descent
2017
Samuel Smith
Quoc V. Le
+
Understanding Generalization and Stochastic Gradient Descent
2017
Samuel Smith
Quoc V. Le
+
Train longer, generalize better: closing the generalization gap in large batch training of neural networks
2017
Elad Hoffer
Itay Hubara
Daniel Soudry
+
On the Generalization Benefit of Noise in Stochastic Gradient Descent
2020
Samuel Smith
Erich Elsen
Soham De
+
A Bayesian Perspective on Training Speed and Model Selection
2020
Clare Lyle
Lisa Schut
Binxin Ru
Yarin Gal
Mark van der Wilk
+
On the different regimes of Stochastic Gradient Descent
2023
Antonio Sclocchi
Matthieu Wyart
+
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
2016
Nitish Shirish Keskar
Dheevatsa Mudigere
Jorge Nocedal
Mikhail Smelyanskiy
Ping Tang
+
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
2016
Nitish Shirish Keskar
Dheevatsa Mudigere
Jorge Nocedal
Mikhail Smelyanskiy
Ping Tang
+
Interplay Between Optimization and Generalization of Stochastic Gradient Descent with Covariance Noise.
2019
Yeming Wen
Kevin Luk
Maxime Gazeau
Guodong Zhang
Harris Chan
Jimmy Ba
+
Three Factors Influencing Minima in SGD
2017
Stanisław Jastrzȩbski
Zachary Kenton
Devansh Arpit
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
+
Understanding deep learning requires rethinking generalization
2016
Chiyuan Zhang
Samy Bengio
Moritz Hardt
Benjamin Recht
Oriol Vinyals
+
Understanding deep learning requires rethinking generalization
2016
Chiyuan Zhang
Samy Bengio
Moritz Hardt
Benjamin Recht
Oriol Vinyals
+
The Effect of SGD Batch Size on Autoencoder Learning: Sparsity, Sharpness, and Feature Learning
2023
Nikhil Ghosh
Spencer Frei
Wooseok Ha
Bin Yu
+
PDF
Chat
Deep learning: a statistical viewpoint
2021
Peter L. Bartlett
Andrea Montanari
Alexander Rakhlin
+
Deep learning: a statistical viewpoint
2021
Peter L. Bartlett
Andrea Montanari
Alexander Rakhlin
+
Deep learning: a statistical viewpoint
2021
Peter L. Bartlett
Andrea Montanari
Alexander Rakhlin
+
How Two-Layer Neural Networks Learn, One (Giant) Step at a Time
2023
Yatin Dandi
Florent Krząkała
Bruno Loureiro
Luca Pesce
Ludovic Stephan
+
An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise
2019
Yeming Wen
Kevin Luk
Maxime Gazeau
Guodong Zhang
Harris Chan
Jimmy Ba
+
Stochastic Training is Not Necessary for Generalization
2021
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
+
The Impact of the Mini-batch Size on the Variance of Gradients in Stochastic Gradient Descent
2020
X. Qian
Diego Klabjan
Works That Cite This (8)
Action
Title
Year
Authors
+
TherML: Thermodynamics of Machine Learning
2018
Alexander A. Alemi
Ian S. Fischer
+
An Empirical Model of Large-Batch Training
2018
Sam McCandlish
Jared Kaplan
Dario Amodei
OpenAI Dota Team
+
PDF
Chat
Training Tips for the Transformer Model
2018
Martin Popel
Ondřej Bojar
+
Hessian-based Analysis of Large Batch Training and Robustness to Adversaries
2018
Zhewei Yao
Amir Gholami
Qi Lei
Kurt Keutzer
Michael W. Mahoney
+
Strength of Minibatch Noise in SGD
2021
Ziyin Liu
Kangqiao Liu
T. Mori
Masahito Ueda
+
SALR: Sharpness-aware Learning Rates for Improved Generalization
2021
Xubo Yue
Maher Nouiehed
Raed Al Kontar
+
DNN or $k$-NN: That is the Generalize vs. Memorize Question
2018
Guy Cohen
Raja Giryes
Guillermo Sapiro
+
Large Scale Language Modeling: Converging on 40GB of Text in Four Hours
2018
Raul Puri
Robert M. Kirby
Nikolai Yakovenko
Bryan Catanzaro
Works Cited by This (0)
Action
Title
Year
Authors