Statistical Mechanics of Deep Linear Neural Networks: The Backpropagating Kernel Renormalization

Qianyi Li, Haim Sompolinsky

Type: Article

Publication Date: 2021-09-16

Citations: 12

DOI: https://doi.org/10.1103/physrevx.11.031059

Abstract

The success of deep learning in many real-world tasks has triggered an intense effort to understand the power and limitations of deep learning in the training and generalization of complex tasks, so far with limited progress. In this work, we study the statistical mechanics of learning in Deep Linear Neural Networks (DLNNs) in which the input-output function of an individual unit is linear. Despite the linearity of the units, learning in DLNNs is nonlinear, hence studying its properties reveals some of the features of nonlinear Deep Neural Networks (DNNs). Importantly, we solve exactly the network properties following supervised learning using an equilibrium Gibbs distribution in the weight space. To do this, we introduce the Back-Propagating Kernel Renormalization (BPKR), which allows for the incremental integration of the network weights starting from the network output layer and progressing backward until the first layer's weights are integrated out. This procedure allows us to evaluate important network properties, such as its generalization error, the role of network width and depth, the impact of the size of the training set, and the effects of weight regularization and learning stochasticity. BPKR does not assume specific statistics of the input or the task's output. Furthermore, by performing partial integration of the layers, the BPKR allows us to compute the properties of the neural representations across the different hidden layers. We have proposed an extension of the BPKR to nonlinear DNNs with ReLU. Surprisingly, our numerical simulations reveal that despite the nonlinearity, the predictions of our theory are largely shared by ReLU networks in a wide regime of parameters. Our work is the first exact statistical mechanical study of learning in a family of DNNs, and the first successful theory of learning through successive integration of DoFs in the learned weight space.

Locations

Physical Review X - View - PDF
arXiv (Cornell University) - View - PDF
DOAJ (DOAJ: Directory of Open Access Journals) - View
DataCite API - View

Similar Works

Action	Title	Year	Authors
+	Statistical Mechanics of Deep Linear Neural Networks: The Back-Propagating Renormalization Group.	2020	Qianyi Li Haim Sompolinsky
+ PDF Chat	The Limiting Dynamics of SGD: Modified Loss, Phase-Space Oscillations, and Anomalous Diffusion	2023	Daniel Kunin Javier Sagastuy-Breña Lauren Gillespie Eshed Margalit Hidenori Tanaka Surya Ganguli Daniel Yamins
+	Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice	2017	Jeffrey Pennington Samuel S. Schoenholz Surya Ganguli
+	Rethinking the limiting dynamics of SGD: modified loss, phase space oscillations, and anomalous diffusion	2021	Daniel Kunin Javier Sagastuy-Breña Lauren Gillespie Eshed Margalit Hidenori Tanaka Surya Ganguli Daniel Yamins
+	Weak Correlations as the Underlying Principle for Linearization of Gradient-Based Learning Systems	2024	Ori Shem-Ur Yaron Oz
+	The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion	2021	Daniel Kunin Javier Sagastuy-Breña Lauren Gillespie Eshed Margalit Hidenori Tanaka Surya Ganguli Daniel Yamins
+	Mean Field Limit of the Learning Dynamics of Multilayer Neural Networks	2019	Phan-Minh Nguyen
+ PDF Chat	Dynamic neurons: A statistical physics approach for analyzing deep neural networks	2024	Donghee Lee Hye‐Sung Lee J. I. Yi
+	On-line learning dynamics of ReLU neural networks using statistical physics techniques	2019	Michiel Straat Michael Biehl
+	Layer Dynamics of Linearised Neural Nets	2019	Saurav Basu Koyel Mukherjee Shrihari Vasudevan
+	Neural networks: from the perceptron to deep nets	2023	Marylou Gabrié Surya Ganguli Carlo Lucibello Riccardo Zecchina
+	When and why PINNs fail to train: A neural tangent kernel perspective	2020	Sifan Wang Xinling Yu Paris Perdikaris
+ PDF Chat	The Benefits of Reusing Batches for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents	2024	Yatin Dandi Emanuele Troiani Luca Arnaboldi Luca Pesce Lenka Zdeborová Florent Krząkała
+	Theory of Deep Learning III: explaining the non-overfitting puzzle	2017	Tomaso Poggio Kenji Kawaguchi Qianli Liao Brando Miranda Lorenzo Rosasco Xavier Boix Jack D. Hidary H. N. Mhaskar
+ PDF Chat	Lecture Notes on Linear Neural Networks: A Tale of Optimization and Generalization in Deep Learning	2024	Nadav Cohen Noam Razin
+	Theory of Deep Learning III: explaining the non-overfitting puzzle	2018	Tomaso Poggio Kenji Kawaguchi Qianli Liao Brando Miranda Lorenzo Rosasco Xavier Boix Jack D. Hidary H. N. Mhaskar
+	Connecting NTK and NNGP: A Unified Theoretical Framework for Neural Network Learning Dynamics in the Kernel Regime	2023	Yehonatan Avidan Qianyi Li Haim Sompolinsky
+	The Mori-Zwanzig formulation of deep learning	2022	Daniele Venturi Xiantao Li
+	Residual-based attention and connection to information bottleneck theory in PINNs	2023	Sokratis Anagnostopoulos Juan Diego Toscano Nikolaos Stergiopulos George Em Karniadakis
+ PDF Chat	Critical feature learning in deep neural networks	2024	Kirsten Fischer Javed Lindner David Dahmen Zohar Ringel M. Krämer Moritz Helias

Works That Cite This (7)

Action	Title	Year	Authors
+	A simplified Parisi Ansatz II: Random Energy Model universality	2024	Simone Franchini
+	Spin glass theory and its new challenge: structured disorder	2023	Marc Mézard
+ PDF Chat	Asymptotics of representation learning in finite Bayesian neural networks*	2022	Jacob A. Zavatone-Veth Abdülkadir Canatar Benjamin S. Ruben Cengiz Pehlevan
+	Bayesian interpolation with deep linear networks	2023	Boris Hanin Alexander Zlokapa
+ PDF Chat	Gaussian universality of perceptrons with random labels	2024	Federica Gerace Florent Krząkała Bruno Loureiro Ludovic Stephan Lenka Zdeborová
+	Resolution of similar patterns in a solvable model of unsupervised deep learning with structured data	2024	Andrea Baroffio Pietro Rotondo Marco Gherardi
+ PDF Chat	Depth induces scale-averaging in overparameterized linear Bayesian neural networks	2021	Jacob A. Zavatone-Veth Cengiz Pehlevan

Works Cited by This (33)

Action	Title	Year	Authors
+ PDF Chat	Spectral Analysis of Large Dimensional Random Matrices	2009	Zhidong Bai Jack W. Silverstein
+	Critical Behavior of Vortices in a Layered System	1994	Stephen W. Pierson
+	Exact solutions to the nonlinear dynamics of learning in deep linear neural networks	2013	Andrew Saxe James L. McClelland Surya Ganguli
+	Learning to Communicate with Deep Multi-Agent Reinforcement Learning	2016	Jakob Foerster Yannis Assael Nando de Freitas Shimon Whiteson
+	Depth Creates No Bad Local Minima	2017	Haihao Lu Kenji Kawaguchi
+ PDF Chat	Energy landscapes for machine learning	2017	Andrew J. Ballard Ritankar Das Stefano Martiniani Dhagash Mehta Levent Sagun Jacob D. Stevenson David J. Wales
+	Convolutional Gaussian Processes	2017	Mark van der Wilk Carl Edward Rasmussen James Hensman
+	High-dimensional dynamics of generalization error in neural networks	2017	Madhu Advani Andrew Saxe
+ PDF Chat	Classification and Geometry of General Perceptual Manifolds	2018	SueYeon Chung Daniel D. Lee Haim Sompolinsky
+	Deep Neural Networks as Gaussian Processes	2017	Jaehoon Lee Yasaman Bahri Roman Novak Samuel S. Schoenholz Jeffrey Pennington Jascha Sohl‐Dickstein