Disentangling feature and lazy training in deep neural networks
Disentangling feature and lazy training in deep neural networks
Abstract Two distinct limits for deep learning have been derived as the network width h → ∞, depending on how the weights of the last layer scale with h . In the neural tangent Kernel (NTK) limit, the dynamics becomes linear in the weights and is described by a frozen …