Automatic Differentiation is Essential in Training Neural Networks for Solving Differential Equations

Locations

  • Journal of Scientific Computing
  • arXiv (Cornell University)

Ask a Question About This Paper

Summary

This work provides a quantitative demonstration of the superior performance of Automatic Differentiation (AD) over Finite Difference (FD) methods in the training process of neural networks for solving Partial Differential Equations (PDEs). While both AD and FD are used to compute derivatives within neural network-based PDE solvers like Physics-Informed Neural Networks (PINNs), a long-standing debate exists regarding their efficacy. This paper specifically addresses the impact of these differentiation approaches on training error and speed, arguing that AD offers significant advantages in this regard.

A key innovation is the introduction of truncated entropy, a novel metric derived from the effective cut-off number of singular values. This metric serves to characterize the training dynamics, with higher truncated entropy correlating with faster training speeds and lower residual loss. The authors demonstrate both theoretically and experimentally that AD consistently exhibits higher truncated entropy compared to FD, thereby predicting its superior training performance.

The paper offers a detailed analytical and empirical investigation based on two neural network architectures: Random Feature Models (RFMs) and two-layer neural networks.

For RFMs, where PDE solving can be cast as a linear least-squares problem (\(Aa=f\)), the analysis focuses on the singular values of the system matrix \(A\). The findings reveal that:
1. Large singular values of the AD-derived matrix (\(A_{AD}\)) and FD-derived matrix (\(A_{FD}\)) are largely similar, consistent with FD being a numerical approximation of AD.
2. Crucially, small singular values of \(A_{FD}\) are consistently larger than those of \(A_{AD}\). This discrepancy is significant because solving \(Aa=f\) often involves computing the pseudo-inverse of \(A\), and very small singular values can introduce substantial computational errors or numerical instability. By effectively having “smaller” small singular values, \(A_{AD}\) allows for a more accurate pseudo-inverse calculation through truncation, leading to lower training error.

For two-layer neural networks, where training involves non-convex optimization via gradient descent, the analysis shifts to the eigenvalues of the kernel matrix G, which governs the gradient descent dynamics. Similar to the RFM case, the authors find that:
1. Large eigenvalues of the AD-derived kernel (\(G_{AD}\)) and FD-derived kernel (\(G_{FD}\)) are comparable.
2. However, the small eigenvalues of \(G_{FD}\) are again larger than those of \(G_{AD}\). This implies that for FD, a greater number of small, less significant eigenvalues are present in the kernel, which slows down the convergence of gradient descent, ultimately resulting in slower training and potentially higher final training error compared to AD.

Through comprehensive experimental validation across various PDEs (1D Poisson, 2D Poisson, Biharmonic, Allen-Cahn) and network structures (RFM, shallow, deep NNs), the paper confirms these theoretical insights, consistently showing AD’s faster convergence and lower training errors.

The work builds upon established foundations in:
* Physics-Informed Neural Networks (PINNs): A prominent framework for using neural networks to solve differential equations.
* Automatic Differentiation (AD): The cornerstone technique for exact gradient computation in deep learning, enabling backpropagation.
* Numerical Differentiation (e.g., Finite Difference): Traditional methods for approximating derivatives, often simpler but less precise than AD.
* Random Feature Models: A simplified neural network architecture that reduces training to a convex optimization problem, facilitating analytical study.
* Spectral Analysis (Singular Value Decomposition, Eigenvalue Analysis): Standard tools from linear algebra for understanding matrix properties and their impact on system solutions and optimization dynamics.
* Optimization Algorithms (Gradient Descent): The fundamental iterative methods used to train neural networks.
* Concepts of Spectral Entropy and Effective Rank: From information theory and linear algebra, providing a basis for the newly defined “truncated entropy.”

Neural network-based approaches have recently shown significant promise in solving partial differential equations (PDEs) in science and engineering, especially in scenarios featuring complex domains or the incorporation of empirical data. … Neural network-based approaches have recently shown significant promise in solving partial differential equations (PDEs) in science and engineering, especially in scenarios featuring complex domains or the incorporation of empirical data. One advantage of the neural network method for PDEs lies in its automatic differentiation (AD), which necessitates only the sample points themselves, unlike traditional finite difference (FD) approximations that require nearby local points to compute derivatives. In this paper, we quantitatively demonstrate the advantage of AD in training neural networks. The concept of truncated entropy is introduced to characterize the training property. Specifically, through comprehensive experimental and theoretical analyses conducted on random feature models and two-layer neural networks, we discover that the defined truncated entropy serves as a reliable metric for quantifying the residual loss of random feature models and the training speed of neural networks for both AD and FD methods. Our experimental and theoretical analyses demonstrate that, from a training perspective, AD outperforms FD in solving partial differential equations.
Does the use of auto-differentiation yield reasonable updates to deep neural networks that represent neural ODEs? Through mathematical analysis and numerical evidence, we find that when the neural network employs … Does the use of auto-differentiation yield reasonable updates to deep neural networks that represent neural ODEs? Through mathematical analysis and numerical evidence, we find that when the neural network employs high-order forms to approximate the underlying ODE flows (such as the Linear Multistep Method (LMM)), brute-force computation using auto-differentiation often produces non-converging artificial oscillations. In the case of Leapfrog, we propose a straightforward post-processing technique that effectively eliminates these oscillations, rectifies the gradient computation and thus respects the updates of the underlying flow.
Neural networks have been shown to have the ability to solve differential equations (Chakraverty & Mall Neural networks have been shown to have the ability to solve differential equations (Chakraverty & Mall
Neural networks have emerged as powerful tools for constructing numerical solution methods for partial differential equations (PDEs). This review article provides an accessible introduction to recent developments in the field … Neural networks have emerged as powerful tools for constructing numerical solution methods for partial differential equations (PDEs). This review article provides an accessible introduction to recent developments in the field of scientific machine learning, focusing on methods such as Physics-Informed Neural Networks (PINNs), Deep Galerkin Methods (DGM), Deep Ritz Methods, and Neural Operator Methods. We compare these approaches, highlighting their strengths, limitations, and potential areas for improvement. Furthermore, we explore a variety of real-world applications where these neural networkbased PDE solvers have been successfully implemented. Finally, we discuss future directions and the ongoing challenges in this rapidly evolving research area.
Finite difference equations are considered to solve differential equations numerically by utilizing minimization algorithms. Neural minimization algorithms for solving the finite difference equations are presented. Results of numerical simulation are … Finite difference equations are considered to solve differential equations numerically by utilizing minimization algorithms. Neural minimization algorithms for solving the finite difference equations are presented. Results of numerical simulation are described to demonstrate the method. Methods of implementing the algorithms are discussed. General features of the neural algorithms are discussed.
Differential equations emerge in various scientific and engineering domains for modeling physical phenomena.Most differential equations of practical interest are analytically intractable.Traditionally, differential equations are solved by numerical methods.Sophisticated algorithms exist … Differential equations emerge in various scientific and engineering domains for modeling physical phenomena.Most differential equations of practical interest are analytically intractable.Traditionally, differential equations are solved by numerical methods.Sophisticated algorithms exist to integrate differential equations in time and space.Time integration techniques continue to be an active area of research and include backward difference formulas and Runge-Kutta methods (Conde, Gottlieb, Grant, & Shadid, 2017).Common spatial discretization approaches include the finite difference method (FDM), finite volume method (FVM), and finite element method (FEM) as well as spectral methods such as the Fourier-spectral method.These classical methods have been studied in detail and much is known about their convergence properties.Moreover, highly optimized codes exist for solving differential equations of practical interest with these techniques (Seefeldt et al., 2017;Smith & Abeysinghe, 2017).While these methods are efficient and well-studied, their expressibility is limited by their function representation.
Recent work has shown that forward- and reverse- mode automatic differentiation (AD) over the reals is almost always correct in a mathematically precise sense. However, actual programs work with machine-representable … Recent work has shown that forward- and reverse- mode automatic differentiation (AD) over the reals is almost always correct in a mathematically precise sense. However, actual programs work with machine-representable numbers (e.g., floating-point numbers), not reals. In this paper, we study the correctness of AD when the parameter space of a neural network consists solely of machine-representable numbers. In particular, we analyze two sets of parameters on which AD can be incorrect: the incorrect set on which the network is differentiable but AD does not compute its derivative, and the non-differentiable set on which the network is non-differentiable. For a neural network with bias parameters, we first prove that the incorrect set is always empty. We then prove a tight bound on the size of the non-differentiable set, which is linear in the number of non-differentiabilities in activation functions, and give a simple necessary and sufficient condition for a parameter to be in this set. We further prove that AD always computes a Clarke subderivative even on the non-differentiable set. We also extend these results to neural networks possibly without bias parameters.
In this work, we describe a novel approach to building a neural PDE solver leveraging recent advances in transformer based neural network architectures. Our model can provide solutions for different … In this work, we describe a novel approach to building a neural PDE solver leveraging recent advances in transformer based neural network architectures. Our model can provide solutions for different values of PDE parameters without any need for retraining the network. The training is carried out in a self-supervised manner, similar to pretraining approaches applied in language and vision tasks. We hypothesize that the model is in effect learning a family of operators (for multiple parameters) mapping the initial condition to the solution of the PDE at any future time step t. We compare this approach with the Fourier Neural Operator (FNO), and demonstrate that it can generalize over the space of PDE parameters, despite having a higher prediction error for individual parameter values compared to the FNO. We show that performance on a specific parameter can be improved by finetuning the model with very small amounts of data. We also demonstrate that the model scales with data as well as model size.
This paper presents artificial neural networks (ANNs) for solving ordinary differential equations (ODEs) with modified back propagation (mBP). The multilayer perceptron neural networks (MPNNs) are chosen as ANNs model which … This paper presents artificial neural networks (ANNs) for solving ordinary differential equations (ODEs) with modified back propagation (mBP). The multilayer perceptron neural networks (MPNNs) are chosen as ANNs model which have universal approximation power that is beneficial in solving ODEs. This mBP training algorithm which has additional momentum is employed to update the network parameters in the way of unsupervised training. The developed method is applied to solve initial value problems (IVPs) and boundary value problems (BVPs) of ODEs. Simulation results of MPNNs are compared with analytic solutions to show that solutions of ODEs with high accuracy of approximation and fast convergence are obtained by means of ANNs.
Neural ordinary differential equations (ODEs) have attracted much attention as continuous-time counterparts of deep residual neural networks (NNs), and numerous extensions for recurrent NNs have been proposed. Since the 1980s, … Neural ordinary differential equations (ODEs) have attracted much attention as continuous-time counterparts of deep residual neural networks (NNs), and numerous extensions for recurrent NNs have been proposed. Since the 1980s, ODEs have also been used to derive theoretical results for NN learning rules, e.g., the famous connection between Oja's rule and principal component analysis. Such rules are typically expressed as additive iterative update processes which have straightforward ODE counterparts. Here we introduce a novel combination of learning rules and Neural ODEs to build continuous-time sequence processing nets that learn to manipulate short-term memory in rapidly changing synaptic connections of other nets. This yields continuous-time counterparts of Fast Weight Programmers and linear Transformers. Our novel models outperform the best existing Neural Controlled Differential Equation based models on various time series classification tasks, while also addressing their fundamental scalability limitations. Our code is public.
We present a method to solve initial and boundary value problems using artificial neural networks. A trial solution of the differential equation is written as a sum of two parts. … We present a method to solve initial and boundary value problems using artificial neural networks. A trial solution of the differential equation is written as a sum of two parts. The first part satisfies the initial/boundary conditions and contains no adjustable parameters. The second part is constructed so as not to affect the initial/boundary conditions. This part involves a feedforward neural network containing adjustable parameters (the weights). Hence by construction the initial/boundary conditions are satisfied and the network is trained to satisfy the differential equation. The applicability of this approach ranges from single ordinary differential equations (ODE's), to systems of coupled ODE's and also to partial differential equations (PDE's). In this article, we illustrate the method by solving a variety of model problems and present comparisons with solutions obtained using the Galekrkin finite element method for several cases of partial differential equations. With the advent of neuroprocessors and digital signal processors the method becomes particularly interesting due to the expected essential gains in the execution speed.
Ordinary and partial differential equations (DE) are used extensively in scientific and mathematical domains to model physical systems. Current literature has focused primarily on deep neural network (DNN) based methods … Ordinary and partial differential equations (DE) are used extensively in scientific and mathematical domains to model physical systems. Current literature has focused primarily on deep neural network (DNN) based methods for solving a specific DE or a family of DEs. Research communities with a history of using DE models may view DNN-based differential equation solvers (DNN-DEs) as a faster and transferable alternative to current numerical methods. However, there is a lack of systematic surveys detailing the use of DNN-DE methods across physical application domains and a generalized taxonomy to guide future research. This paper surveys and classifies previous works and provides an educational tutorial for senior practitioners, professionals, and graduate students in engineering and computer science. First, we propose a taxonomy to navigate domains of DE systems studied under the umbrella of DNN-DE. Second, we examine the theory and performance of the Physics Informed Neural Network (PINN) to demonstrate how the influential DNN-DE architecture mathematically solves a system of equations. Third, to reinforce the key ideas of solving and discovery of DEs using DNN, we provide a tutorial using DeepXDE, a Python package for developing PINNs, to develop DNN-DEs for solving and discovering a classic DE, the linear transport equation.
Stiff ordinary differential equations (ODEs) are common in many science and engineering fields, but standard neural ODE approaches struggle to accurately learn these stiff systems, posing a significant barrier to … Stiff ordinary differential equations (ODEs) are common in many science and engineering fields, but standard neural ODE approaches struggle to accurately learn these stiff systems, posing a significant barrier to the widespread adoption of neural ODEs. In our earlier work, we addressed this challenge by utilizing single-step implicit methods for solving stiff neural ODEs. While effective, these implicit methods are computationally costly and can be complex to implement. This paper expands on our earlier work by exploring explicit exponential integration methods as a more efficient alternative. We evaluate the potential of these explicit methods to handle stiff dynamics in neural ODEs, aiming to enhance their applicability to a broader range of scientific and engineering problems. We found the integrating factor Euler (IF Euler) method to excel in stability and efficiency. While implicit schemes failed to train the stiff van der Pol oscillator, the IF Euler method succeeded, even with large step sizes. However, IF Euler’s first-order accuracy limits its use, leaving the development of higher-order methods for stiff neural ODEs an open research problem.
Stiff ordinary differential equations (ODEs) are common in many science and engineering fields, but standard neural ODE approaches struggle to accurately learn these stiff systems, posing a significant barrier to … Stiff ordinary differential equations (ODEs) are common in many science and engineering fields, but standard neural ODE approaches struggle to accurately learn these stiff systems, posing a significant barrier to widespread adoption of neural ODEs. In our earlier work, we addressed this challenge by utilizing single-step implicit methods for solving stiff neural ODEs. While effective, these implicit methods are computationally costly and can be complex to implement. This paper expands on our earlier work by exploring explicit exponential integration methods as a more efficient alternative. We evaluate the potential of these explicit methods to handle stiff dynamics in neural ODEs, aiming to enhance their applicability to a broader range of scientific and engineering problems. We found the integrating factor Euler (IF Euler) method to excel in stability and efficiency. While implicit schemes failed to train the stiff Van der Pol oscillator, the IF Euler method succeeded, even with large step sizes. However, IF Euler's first-order accuracy limits its use, leaving the development of higher-order methods for stiff neural ODEs an open research problem.
Deep learning has become a popular tool across many scientific fields, including the study of differential equations, particularly partial differential equations. This work introduces the basic principles of deep learning … Deep learning has become a popular tool across many scientific fields, including the study of differential equations, particularly partial differential equations. This work introduces the basic principles of deep learning and the Deep Galerkin method, which uses deep neural networks to solve differential equations. This primer aims to provide technical and practical insights into the Deep Galerkin method and its implementation. We demonstrate how to solve the one-dimensional heat equation step-by-step. We also show how to apply the Deep Galerkin method to solve systems of ordinary differential equations and integral equations, such as the Fredholm of the second kind. Additionally, we provide code snippets within the text and the complete source code on Github. The examples are designed so that one can run them on a simple computer without needing a GPU.
This is the 2005 second edition of a highly successful and well-respected textbook on the numerical techniques used to solve partial differential equations arising from mathematical models in science, engineering … This is the 2005 second edition of a highly successful and well-respected textbook on the numerical techniques used to solve partial differential equations arising from mathematical models in science, engineering and other fields. The authors maintain an emphasis on finite difference methods for simple but representative examples of parabolic, hyperbolic and elliptic equations from the first edition. However this is augmented by new sections on finite volume methods, modified equation analysis, symplectic integration schemes, convection-diffusion problems, multigrid, and conjugate gradient methods; and several sections, including that on the energy method of analysis, have been extensively rewritten to reflect modern developments. Already an excellent choice for students and teachers in mathematics, engineering and computer science departments, the revised text includes more latest theoretical and industrial developments.
From the Publisher: This book explains the physics behind the recipes of molecular simulation for materials science. Computer simulators are continuously confronted with questions concerning the choice of a particular … From the Publisher: This book explains the physics behind the recipes of molecular simulation for materials science. Computer simulators are continuously confronted with questions concerning the choice of a particular technique for a given application. Since a wide variety of computational tools exists, the choice of technique requires a good understanding of the basic principles. More importantly, such understanding may greatly improve the efficiency of a simulation program. The implementation of simulation methods is illustrated in pseudocodes and their practical use in the case studies used in the text. Examples are included that highlight current applications, and the codes of the case studies are available on the World Wide Web. No prior knowledge of computer simulation is assumed.
We prove that neural networks with a single hidden layer are capable of providing an optimal order of approximation for functions assumed to possess a given number of derivatives, if … We prove that neural networks with a single hidden layer are capable of providing an optimal order of approximation for functions assumed to possess a given number of derivatives, if the activation function evaluated by each principal element satisfies certain technical conditions. Under these conditions, it is also possible to construct networks that provide a geometric order of approximation for analytic target functions. The permissible activation functions include the squashing function (1 − e −x ) −1 as well as a variety of radial basis functions. Our proofs are constructive. The weights and thresholds of our networks are chosen independently of the target function; we give explicit formulas for the coefficients as simple, continuous, linear functionals of the target function.
Approximation properties of a class of artificial neural networks are established. It is shown that feedforward networks with one layer of sigmoidal nonlinearities achieve integrated squared error of order O(1/n), … Approximation properties of a class of artificial neural networks are established. It is shown that feedforward networks with one layer of sigmoidal nonlinearities achieve integrated squared error of order O(1/n), where n is the number of nodes. The approximated function is assumed to have a bound on the first moment of the magnitude distribution of the Fourier transform. The nonlinear parameters associated with the sigmoidal nodes, as well as the parameters of linear combination, are adjusted in the approximation. In contrast, it is shown that for series expansions with n terms, in which only the parameters of linear combination are adjusted, the integrated squared approximation error cannot be made smaller than order 1/n/sup 2/d/ uniformly for functions satisfying the same smoothness assumption, where d is the dimension of the input to the function. For the class of functions examined, the approximation rate and the parsimony of the parameterization of the networks are shown to be advantageous in high-dimensional settings.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">&gt;</ETX>
Random networks of nonlinear functions have a long history of empirical success in function fitting but few theoretical guarantees. In this paper, using techniques from probability on Banach Spaces, we … Random networks of nonlinear functions have a long history of empirical success in function fitting but few theoretical guarantees. In this paper, using techniques from probability on Banach Spaces, we analyze a specific architecture of random nonlinearities, provide L <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">infin</sub> and L <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> error bounds for approximating functions in Reproducing Kernel Hilbert Spaces, and discuss scenarios when these expansions are dense in the continuous functions. We discuss connections between these random nonlinear networks and popular machine learning algorithms and show experimentally that these networks provide competitive performance at far lower computational cost on large-scale pattern recognition tasks.
Significance Partial differential equations (PDEs) are among the most ubiquitous tools used in modeling problems in nature. However, solving high-dimensional PDEs has been notoriously difficult due to the “curse of … Significance Partial differential equations (PDEs) are among the most ubiquitous tools used in modeling problems in nature. However, solving high-dimensional PDEs has been notoriously difficult due to the “curse of dimensionality.” This paper introduces a practical algorithm for solving nonlinear PDEs in very high (hundreds and potentially thousands of) dimensions. Numerical results suggest that the proposed algorithm is quite effective for a wide variety of problems, in terms of both accuracy and speed. We believe that this opens up a host of possibilities in economics, finance, operational research, and physics, by considering all participating agents, assets, resources, or particles together at the same time, instead of making ad hoc assumptions on their interrelationships.
Papers from the 2006 flagship meeting on neural computation, with contributions from physicists, neuroscientists, mathematicians, statisticians, and computer scientists. The annual Neural Information Processing Systems (NIPS) conference is the flagship … Papers from the 2006 flagship meeting on neural computation, with contributions from physicists, neuroscientists, mathematicians, statisticians, and computer scientists. The annual Neural Information Processing Systems (NIPS) conference is the flagship meeting on neural computation and machine learning. It draws a diverse group of attendees—physicists, neuroscientists, mathematicians, statisticians, and computer scientists—interested in theoretical and applied aspects of modeling, simulating, and building neural-like or intelligent systems. The presentations are interdisciplinary, with contributions in algorithms, learning theory, cognitive science, neuroscience, brain imaging, vision, speech and signal processing, reinforcement learning, and applications. Only twenty-five percent of the papers submitted are accepted for presentation at NIPS, so the quality is exceptionally high. This volume contains the papers presented at the December 2006 meeting, held in Vancouver. Bradford Books imprint
Physically plausible fluid simulations play an important role in modern computer graphics and engineering. However, in order to achieve real-time performance, computational speed needs to be traded-off with physical accuracy. … Physically plausible fluid simulations play an important role in modern computer graphics and engineering. However, in order to achieve real-time performance, computational speed needs to be traded-off with physical accuracy. Surrogate fluid models based on neural networks have the potential to achieve both, fast fluid simulations and high physical accuracy. However, these approaches rely on massive amounts of training data, require complex pipelines for training and inference or do not generalize to new fluid domains. In this work, we present significant extensions to a recently proposed deep learning framework, which addresses the aforementioned challenges in 2D. We go from 2D to 3D and propose an efficient architecture to cope with the high demands of 3D grids in terms of memory and computational complexity. Furthermore, we condition the neural fluid model on additional information about the fluid's viscosity and density which allows simulating laminar as well as turbulent flows based on the same surrogate model. Our method allows to train fluid models without requiring fluid simulation data beforehand. Inference is fast and simple, as the fluid model directly maps a fluid state and boundary conditions at a moment t to a subsequent fluid state at t+dt. We obtain real-time fluid simulations on a 128x64x64 grid that include various fluid phenomena such as the Magnus effect or Karman vortex streets and generalize to domain geometries not considered during training. Our method indicates strong improvements in terms of accuracy, speed and generalization capabilities over current 3D NN-based fluid models.
Abstract Physics-informed neural networks approximate solutions of PDEs by minimizing pointwise residuals. We derive rigorous bounds on the error, incurred by PINNs in approximating the solutions of a large class … Abstract Physics-informed neural networks approximate solutions of PDEs by minimizing pointwise residuals. We derive rigorous bounds on the error, incurred by PINNs in approximating the solutions of a large class of linear parabolic PDEs, namely Kolmogorov equations that include the heat equation and Black-Scholes equation of option pricing, as examples. We construct neural networks, whose PINN residual (generalization error) can be made as small as desired. We also prove that the total L 2 -error can be bounded by the generalization error, which in turn is bounded in terms of the training error, provided that a sufficient number of randomly chosen training (collocation) points is used. Moreover, we prove that the size of the PINNs and the number of training samples only grow polynomially with the underlying dimension, enabling PINNs to overcome the curse of dimensionality in this context. These results enable us to provide a comprehensive error analysis for PINNs in approximating Kolmogorov PDEs.
Related DatabasesWeb of Science You must be logged in with an active subscription to view this.Article DataHistorySubmitted: 22 June 2020Accepted: 21 June 2021Published online: 30 September 2021Keywordsdeep ReLU network, smooth … Related DatabasesWeb of Science You must be logged in with an active subscription to view this.Article DataHistorySubmitted: 22 June 2020Accepted: 21 June 2021Published online: 30 September 2021Keywordsdeep ReLU network, smooth function, polynomial approximation, function composition, curse of dimensionalityAMS Subject Headings78M32, 41A25, 41A63Publication DataISSN (print): 0036-1410ISSN (online): 1095-7154Publisher: Society for Industrial and Applied MathematicsCODEN: sjmaah
Stochastic gradient Langevin dynamics (SGLD) is a standard sampling technique for uncertainty estimation in Bayesian neural networks.Past methods have shown improved convergence by including a preconditioning of SGLD based on … Stochastic gradient Langevin dynamics (SGLD) is a standard sampling technique for uncertainty estimation in Bayesian neural networks.Past methods have shown improved convergence by including a preconditioning of SGLD based on RMSprop.This preconditioning serves to adapt to the local geometry of the parameter space and improve the performance of deep neural networks.In this paper, we develop another preconditioning technique to accelerate training and improve convergence by incorporating a recently developed batch normalization preconditioning (BNP), into our methods.BNP uses mini-batch statistics to improve the conditioning of the Hessian of the loss function in traditional neural networks and thus improve convergence.We will show that applying BNP to SGLD will improve the conditioning of the Fisher information matrix, which improves the convergence.We present the results of this method on three experiments including a simulation example, a contextual bandit example, and a residual network which show the improved initial convergence provided by BNP, in addition to an improved condition number from this method.
In recent engineering applications using deep learning, physics-informed neural network (PINN) is a new development as it can exploit the underlying physics of engineering systems. The novelty of PINN lies … In recent engineering applications using deep learning, physics-informed neural network (PINN) is a new development as it can exploit the underlying physics of engineering systems. The novelty of PINN lies in the use of partial differential equations (PDE) for the loss function. Most PINNs are implemented using automatic differentiation (AD) for training the PDE loss functions. A lesser well-known study is the use of finite difference method (FDM) as an alternative. Unlike an AD based PINN, an immediate benefit of using a FDM based PINN is low implementation cost. In this paper, we propose the use of finite difference method for estimating the PDE loss functions in PINN. Our work is inspired by computational analysis in electromagnetic systems that traditionally solve Laplace’s equation using successive over-relaxation. In the case of Laplace’s equation, our PINN approach can be seen as taking the Laplacian filter response of the neural network output as the loss function. Thus, the implementation of PINN can be very simple. In our experiments, we tested PINN on Laplace’s equation and Burger’s equation. We showed that using FDM, PINN consistently outperforms non-PINN based deep learning. When comparing to AD based PINNs, we showed that our method is faster to compute as well as on par in terms of error reduction.
.This paper develops a new class of nonlinear acceleration algorithms based on extending conjugate residual-type procedures from linear to nonlinear equations. The main algorithm has strong similarities with Anderson acceleration … .This paper develops a new class of nonlinear acceleration algorithms based on extending conjugate residual-type procedures from linear to nonlinear equations. The main algorithm has strong similarities with Anderson acceleration as well as with inexact Newton methods—depending on which variant is implemented. We prove theoretically and verify experimentally, on a variety of problems from simulation experiments to deep learning applications, that our method is a powerful accelerated iterative algorithm. The code is available at https://github.com/Data-driven-numerical-methods/Nonlinear-Truncated-Conjugate-Residual.Keywordsnonlinear accelerationgeneralized conjugate residualtruncated GCRAnderson accelerationNewton's methoddeep learningMSC codes65F1068W2565F0890C53