Two-Point Step Size Gradient Methods

Type: Article
Publication Date: 1988-01-01
Citations: 2718
DOI: https://doi.org/10.1093/imanum/8.1.141

Abstract

We derive two-point step sizes for the steepest-descent method by approximating the secant equation. At the cost of storage of an extra iterate and gradient, these algorithms achieve better performance and cheaper computation than the classical steepest-descent method. We indicate a convergence analysis of the method in the two-dimensional quadratic case. The behaviour is highly remarkable and the analysis entirely nonstandard.

Locations

  • IMA Journal of Numerical Analysis

Step gradient

2015-12-17
Les methodes barrieres proposent de resoudre le probleme non lineaire en resolvant une suite de problemes penalises. Le lien entre la suite, dite externe, des solutions des fonctions penalisees et … Les methodes barrieres proposent de resoudre le probleme non lineaire en resolvant une suite de problemes penalises. Le lien entre la suite, dite externe, des solutions des fonctions penalisees et la solution du probleme initial a ete etablie dans les annees soixante. Dans cette these, nous avons utilise une fonction barriere logarithmique. A chaque iteration externe, la technique SQP se charge de produire une serie de sous-problemes quadratiques dont les solutions forment une suite, dite interne, de directions de descente pour resoudre le probleme non lineaire penalise. Nous avons introduit un changement de variable sur le pas de deplacement ce qui a permis d'obtenir des conditions d'optimalite plus stable numeriquement. Nous avons realise des simulations numeriques pour comparer les performances de la methode des gradients conjugues a celle de la methode D.C., appliquees pour resoudre des problemes quadratiques de region de confiance. Nous avons adapte la methode D.C. pour resoudre les sous-problemes verticaux, ce qui nous a permis de ramener leurs dimensions de $n+m$ a $m+p$ ($ p L'evolution de l'algorithme est controlee par la fonction de merite. Des tests numeriques permettent de comparer les avantages de differentes formes de la fonction de merite. Nous avons introduit de nouvelles regles pour ameliorer cette evolution. Les experiences numeriques montrent un gain concernant le nombre de problemes resolus. L'etude de la convergence de notre methode SDC, clot ce travail.
Numerical Methods for Partial Differential Equations is an international journal that publishes the highest quality research in the rigorous analysis of novel techniques for the numerical solution of partial differential … Numerical Methods for Partial Differential Equations is an international journal that publishes the highest quality research in the rigorous analysis of novel techniques for the numerical solution of partial differential equations (PDEs).The journal is intended to be accessible to a broad spectrum of researchers into numerical approximation of PDEs throughout science and engineering, with priority given to articles that focus on theoretical results describing robustness, stability, and convergence of the new methods rather than the techniques themselves and specific applications.The Journal seeks to be interdisciplinary, while emphasizing numerical analysis and approximation theory in the following areas of research: discretization schemes for linear, nonlinear, and fractional PDEs; methods for optimal control and parameter estimation problems; techniques for high-dimensional spatial and parameterized PDEs; learning algorithms for data-driven solutions to PDEs; and new approaches for modeling complex phenomena with PDEs. Numerical Methods for Partial Differential Equations
A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content. A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content.
A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content. A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content.
A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content. A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content.
A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content. A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content.
Abstract The limited memory steepest descent method (LMSD, Fletcher, 2012) for unconstrained optimization problems stores a few past gradients to compute multiple stepsizes at once. We review this method and … Abstract The limited memory steepest descent method (LMSD, Fletcher, 2012) for unconstrained optimization problems stores a few past gradients to compute multiple stepsizes at once. We review this method and propose new variants. For strictly convex quadratic objective functions, we study the numerical behavior of different techniques to compute new stepsizes. In particular, we introduce a method to improve the use of harmonic Ritz values. We also show the existence of a secant condition associated with LMSD, where the approximating Hessian is projected onto a low-dimensional space. In the general nonlinear case, we propose two new alternatives to Fletcher's method: first, the addition of symmetry constraints to the secant condition valid for the quadratic case; second, a perturbation of the last differences between consecutive gradients, to satisfy multiple secant equations simultaneously. We show that Fletcher's method can also be interpreted from this viewpoint. AMS Classification: 65K05 , 90C20 , 90C30 , 65F15 , 65F10
This thesis considers methods for synthesis of linear quadratic controllers for large-scale, interconnected systems. Conventional methods that solve the linear quadratic control problem are only applicable to systems with moderate … This thesis considers methods for synthesis of linear quadratic controllers for large-scale, interconnected systems. Conventional methods that solve the linear quadratic control problem are only applicable to systems with moderate size, due to the rapid increase in both computational time and memory requirements as the system size increases. The methods presented in this thesis show a much slower increase in these requirements when faced with system matrices with a sparse structure. Hence, they are useful for control design for systems of large order, since they usually have sparse systems matrices. An equally important feature of the methods is that the controllers are restricted to have a distributed nature, meaning that they respect a potential interconnection structure of the system. The controllers considered in the thesis have the same structure as the centralized LQG solution, that is, they are consisting of a state predictor and feedback from the estimated states. Strategies for determining the feedback matrix and predictor matrix separately, are suggested. The strategies use gradient directions of the cost function to iteratively approach a locally optimal solution in either problem. A scheme to determine bounds on the degree of suboptimality of the partial solution in every iteration, is presented. It is also shown that these bounds can be combined to give a bound on the degree of suboptimality of the full output feedback controller. Another method that treats the synthesis of the feedback matrix and predictor matrix simultaneously is also presented. The functionality of the developed methods is illustrated by an application, where the methods are used to compute controllers for a large deformable mirror, found in a telescope to compensate for atmospheric disturbances. The model of the mirror is obtained by discretizing a partial differential equation. This gives a linear, sparse representation of the mirror with a very large state space, which is suitable for the methods presented in the thesis. The performance of the controllers is evaluated using performance measures from the adaptive optics community.
This paper proposes and analyses a subsampled Cubic Regularization Method (CRM) for solving finite-sum optimization problems. The new method uses random subsampling techniques to approximate the functions, gradients and Hessians … This paper proposes and analyses a subsampled Cubic Regularization Method (CRM) for solving finite-sum optimization problems. The new method uses random subsampling techniques to approximate the functions, gradients and Hessians in order to reduce the overall computational cost of the CRM. Under suitable hypotheses, first- and second-order iteration-complexity bounds and global convergence analyses are presented. We also discuss the local convergence properties of the method. Numerical experiments are presented to illustrate the performance of the proposed scheme.
In this paper, we propose a Perry-type derivative-free algorithm for solving systems of nonlinear equations. The algorithm is based on the well-known BFGS quasi-Newton method with a modified Perry's parameter. … In this paper, we propose a Perry-type derivative-free algorithm for solving systems of nonlinear equations. The algorithm is based on the well-known BFGS quasi-Newton method with a modified Perry's parameter. The global convergence of the algorithm is established without assumption on the regularity or boundedness of the solution set. Meanwhile, the sequence of iterates generated by the algorithm converges globally to the solution of the problem provided that the function is Lipschitz continuous and monotone. Preliminary numerical experiments on some collection of general nonlinear equations and convex constrained nonlinear monotone equations demonstrate the efficiency of the algorithm. Moreover, we successfully apply the proposed algorithm to solve signal recovery problem.
The spectral theory of higher-order symmetric tensors is an important tool for revealing some important properties of a hypergraph via its adjacency tensor, Laplacian tensor, and signless Laplacian tensor. Owing … The spectral theory of higher-order symmetric tensors is an important tool for revealing some important properties of a hypergraph via its adjacency tensor, Laplacian tensor, and signless Laplacian tensor. Owing to the sparsity of these tensors, we propose an efficient approach to calculate products of these tensors and any vectors. By using the state-of-the-art L-BFGS approach, we develop a first-order optimization algorithm for computing H- and Z-eigenvalues of these large scale sparse tensors (CEST). With the aid of the Łojasiewicz inequality, we prove that the sequence of iterates generated by CEST converges to an eigenvector of the tensor. When CEST is started from multiple random initial points, the resulting best eigenvalue could touch the extreme eigenvalue with a high probability. Finally, numerical experiments on small hypergraphs show that CEST is efficient and promising. Moreover, CEST is capable of computing eigenvalues of tensors related to a hypergraph with millions of vertices.
.This paper is concerned with \(\ell_q\,(0\lt q\lt 1)\) -norm regularized minimization problems with a twice continuously differentiable loss function. For this class of nonconvex and nonsmooth composite problems, many algorithms … .This paper is concerned with \(\ell_q\,(0\lt q\lt 1)\) -norm regularized minimization problems with a twice continuously differentiable loss function. For this class of nonconvex and nonsmooth composite problems, many algorithms have been proposed to solve them, most of which are of the first-order type. In this work, we propose a hybrid of the proximal gradient method and the subspace regularized Newton method, called HpgSRN. The whole iterate sequence produced by HpgSRN is proved to have a finite length and to converge to an \(L\) -type stationary point under a mild curve-ratio condition and the Kurdyka–Łojasiewicz property of the cost function; it converges linearly if a further Kurdyka–Łojasiewicz property of exponent \(1/2\) holds. Moreover, a superlinear convergence rate for the iterate sequence is also achieved under an additional local error bound condition. Our convergence results do not require the isolatedness and strict local minimality properties of the \(L\) -stationary point. Numerical comparisons with ZeroFPR, a hybrid of proximal gradient method and quasi-Newton method for the forward-backward envelope of the cost function, proposed in [A. Themelis, L. Stella, and P. Patrinos, SIAM J. Optim., 28 (2018), pp. 2274–2303] for the \(\ell_q\) -norm regularized linear and logistic regressions on real data, indicate that HpgSRN not only requires much less computing time but also yields comparable or even better sparsities and objective function values.Keywords \(\ell_q\) -norm regularized composite optimizationregularized Newton methodglobal convergencesuperlinear convergence rateKL propertylocal error boundMSC codes90C2665K0590C0649J52
We present the gradient descent algorithm for unconstrained minimization problems using tools of the fractional calculus, a field of mathematics with applications in widespread areas of science and engineering. At … We present the gradient descent algorithm for unconstrained minimization problems using tools of the fractional calculus, a field of mathematics with applications in widespread areas of science and engineering. At each iteration of the fractional order gradient descent algorithm, the search direction is determined by means of fractional gradient, being a new alternative to solve large scale minimization problems involving one useful class of functions.
Finding the stationary states of a free energy functional is an essential topic in multicomponent material systems. In this paper, we propose a class of efficient numerical algorithms to fast … Finding the stationary states of a free energy functional is an essential topic in multicomponent material systems. In this paper, we propose a class of efficient numerical algorithms to fast compute the stationary states of multicomponent phase field crystal model. Our approach formulates the problem as solving a constrained non-convex minimization problem. By using the block structure of multicomponent systems, we propose an adaptive block Bregman proximal gradient algorithm that updates each order parameter alternatively. The updating block can be chosen in a deterministic or random manner. The convergence property of the proposed algorithm is established without the requirement of global Lipschitz constant. The numerical results on computing stationary periodic crystals and quasicrystals in the multicomponent coupled-mode Swift-Hohenberg model have shown the significant acceleration over many existing methods.
Network modeling of high-dimensional time series data is a key learning task due to its widespread use in a number of application areas, including macroeconomics, finance, and neuroscience. While the … Network modeling of high-dimensional time series data is a key learning task due to its widespread use in a number of application areas, including macroeconomics, finance, and neuroscience. While the problem of sparse modeling based on vector autoregressive models (VAR) has been investigated in depth in the literature, more complex network structures that involve low rank and group sparse components have received considerably less attention, despite their presence in data. Failure to account for low-rank structures results in spurious connectivity among the observed time series, which may lead practitioners to draw incorrect conclusions about pertinent scientific or policy questions. In order to accurately estimate a network of Granger causal interactions after accounting for latent effects, we introduce a novel approach for estimating low-rank and structured sparse high-dimensional VAR models. We introduce a regularized framework involving a combination of nuclear norm and lasso (or group lasso) penalties. Subsequently, we establish nonasymptotic probabilistic upper bounds on the estimation error rates of the low-rank and the structured sparse components. We also introduce a fast estimation algorithm and finally demonstrate the performance of the proposed modeling framework over standard sparse VAR estimates through numerical experiments on synthetic and real data.
The limited memory steepest descent method (LMSD) proposed by Fletcher is an extension of the Barzilai–Borwein 'two-point step size' strategy for steepest descent methods for solving unconstrained optimization problems. It … The limited memory steepest descent method (LMSD) proposed by Fletcher is an extension of the Barzilai–Borwein 'two-point step size' strategy for steepest descent methods for solving unconstrained optimization problems. It is known that the Barzilai–Borwein strategy yields a method with an |$R$|-linear rate of convergence when it is employed to minimize a strongly convex quadratic. This article extends this analysis for LMSD, also for strongly convex quadratics. In particular, it is shown that, under reasonable assumptions, the method is |$R$|-linearly convergent for any choice of the history length parameter. The results of numerical experiments are also provided to illustrate behaviors of the method that are revealed through the theoretical analysis.
We consider the problem of adding edges to connected resistive networks in order to optimally enhance their performance. The performance is captured by the H2 norm of the closed-loop network … We consider the problem of adding edges to connected resistive networks in order to optimally enhance their performance. The performance is captured by the H2 norm of the closed-loop network and the ℓ1 regularization is introduced as a means to promote sparsity of the controller graph Laplacian. The resulting optimal control problem can be cast as a semidefinite program and standard interior point method solvers can be used to efficiently compute the optimal solution for small and medium size networks. In this paper, we develop two efficient customized algorithms for large-scale problems. Our customized algorithms are based on the proximal gradient method and the sequential quadratic approximation method. In the latter, the Newton direction is obtained using coordinate descent algorithm over the set of active variables. We provide comparison of these methods and show that both of them can be effectively employed to solve topology identification and optimal design problems for large-scale networks.
We consider a statistical inverse learning problem, where the task is to estimate a function $ f $ based on noisy point evaluations of $ Af $, where $ A … We consider a statistical inverse learning problem, where the task is to estimate a function $ f $ based on noisy point evaluations of $ Af $, where $ A $ is a linear operator. The function $ Af $ is evaluated at i.i.d. random design points $ u_n $, $ n = 1, ..., N $ generated by an unknown general probability distribution. We consider Tikhonov regularization with general convex and $ p $-homogeneous penalty functionals and derive concentration rates of the regularized solution to the ground truth measured in the symmetric Bregman distance induced by the penalty functional. We derive concrete rates for Besov norm penalties and numerically demonstrate the correspondence with the observed rates in the context of X-ray tomography.
In this paper, we are concerned with the conjugate gradient method for solving unconstrained optimization problems due to its simplicity and don’t store any matrices. We proposed two spectral modifications … In this paper, we are concerned with the conjugate gradient method for solving unconstrained optimization problems due to its simplicity and don’t store any matrices. We proposed two spectral modifications to the conjugate descent (CD). These two proposed methods produces sufficient descent directions for the objective function at every iteration with strong Wolfe line searches and with inexact line search, and also they are globally convergent for general non-convex functions can be guaranteed. Numerical results show the efficiency of these two proposed methods.
In this work, we propose a novel algorithm to perform spectral conjugate gradient descent for an unconstrained, nonlinear optimization problem. First, we theoretically prove that the proposed method satisfies the … In this work, we propose a novel algorithm to perform spectral conjugate gradient descent for an unconstrained, nonlinear optimization problem. First, we theoretically prove that the proposed method satisfies the sufficient descent condition, the conjugacy condition, and the global convergence theorem. The experimental setup uses Powell's conjugacy condition coupled with a cubic polynomial line search using strong Wolfe conditions to ensure quick convergence. The experimental results demonstrate that the proposed method shows superior performance in terms of the number of iterations to convergence and the number of function evaluations when compared to traditional methods such as Liu and Storey (LS) and Conjugate Descent (CD).
A novel linearizing alternating direction augmented Lagrangian approach is proposed for effectively solving semidefinite programs (SDP). For every iteration, by fixing the other variables, the proposed approach alternatively optimizes the … A novel linearizing alternating direction augmented Lagrangian approach is proposed for effectively solving semidefinite programs (SDP). For every iteration, by fixing the other variables, the proposed approach alternatively optimizes the dual variables and the dual slack variables; then the primal variables, that is, Lagrange multipliers, are updated. In addition, the proposed approach renews all the variables in closed forms without solving any system of linear equations. Global convergence of the proposed approach is proved under mild conditions, and two numerical problems are given to demonstrate the effectiveness of the presented approach.
Based on several modified Hestenes–Stiefel and Polak–Ribière–Polyak nonlinear conjugate gradient methods, a family of three term limited memory CG methods are developed. When the current search direction falls into the … Based on several modified Hestenes–Stiefel and Polak–Ribière–Polyak nonlinear conjugate gradient methods, a family of three term limited memory CG methods are developed. When the current search direction falls into the subspace spanned by the previous m directions, the algorithm branches to optimize the objective function on the subspace by using L-BFGS method. We use this strategy to avoid potential local loops and accelerate the convergence. The proposed methods are sufficient descent. When the steplength is determined by Wolfe or Armijo line search, we establish the global convergence of the methods in a concise way. Numerical results verify the efficiency of the proposed method for the unconstrained optimization problems in the CUTEst library.
Hybridizing the three–term conjugate gradient method proposed by Zhang et al. and the nonlinear conjugate gradient method proposed by Dai and Liao based on the scaled memoryless BFGS update, a … Hybridizing the three–term conjugate gradient method proposed by Zhang et al. and the nonlinear conjugate gradient method proposed by Dai and Liao based on the scaled memoryless BFGS update, a one–parameter class of four–term conjugate gradient methods is proposed. It is shown that the suggested class of conjugate gradient methods possesses the sufficient descent property, without convexity assumption on the objective function. A brief global convergence analysis is made for uniformly convex objective functions. Results of numerical comparisons are reported. They demonstrate efficiency of a method of the proposed class in the sense of the Dolan–Moré performance profile.
The performance of gradient methods has been considerably improved by the introduction of delayed parameters. Recently, the revealing of second-order information has given rise to the Cauchy-based methods with alignment, … The performance of gradient methods has been considerably improved by the introduction of delayed parameters. Recently, the revealing of second-order information has given rise to the Cauchy-based methods with alignment, which are generally considered as the state of the art of gradient methods. This paper investigates the spectral properties of minimal gradient and asymptotically optimal steps, and then suggests three fast methods with alignment without using the Cauchy step. The convergence results are provided, and numerical experiments show that the new methods provide competitive alternatives to the classical Cauchy-based methods. In particular, alignment gradient methods present advantages over the Krylov subspace methods in some situations, which makes them attractive in practice.
The fast iterative shrinkage-thresholding algorithm (FISTA) is a widely used procedure for minimizing the sum of two convex functions, such that one has a L-Lipschitz continuous gradient and the other … The fast iterative shrinkage-thresholding algorithm (FISTA) is a widely used procedure for minimizing the sum of two convex functions, such that one has a L-Lipschitz continuous gradient and the other is possible nonsmooth. While FISTA's theoretical rate of convergence (RoC) is proportional to (1/α <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</sub> t <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</sub> <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ), and it is related to (i) its extragradient rule / inertial sequence, which depends on sequence t <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</sub> , and (ii) the step-size α <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</sub> , which estimates L, its worst-case complexity results in O(k <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">-2</sup> ) since, originally, (i) by construction t <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</sub> ≥ (k+1/2), and (ii) the condition α <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</sub> ≥ α <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k+1</sub> was imposed. Attempts to improve FISTA's RoC include alternative inertial sequences, and intertwining the selection of the inertial sequence and the step-size.In this paper, we show that if a bounded and non-decreasing step-size sequence (α <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</sub> ≤ α <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k+1</sub> , decoupled from the inertial sequence) can be generated via some adaptive scheme, then FISTA can achieve a RoC proportional to k <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">-3</sup> for the indexes where the step-size exhibits an approximate linear growth, with the default O(k <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">-2</sup> ) behavior when the step-size's bound is reached. Furthermore, such exceptional step-size sequence can be easily generated, and it indeed boots FISTA's practical performance.
Abstract We study the use of inverse harmonic Rayleigh quotients with target for the stepsize selection in gradient methods for nonlinear unconstrained optimization problems. This not only provides an elegant … Abstract We study the use of inverse harmonic Rayleigh quotients with target for the stepsize selection in gradient methods for nonlinear unconstrained optimization problems. This not only provides an elegant and flexible framework to parametrize and reinterpret existing stepsize schemes, but it also gives inspiration for new flexible and tunable families of steplengths. In particular, we analyze and extend the adaptive Barzilai–Borwein method to a new family of stepsizes. While this family exploits negative values for the target, we also consider positive targets. We present a convergence analysis for quadratic problems extending results by Dai and Liao (IMA J Numer Anal 22(1):1–10, 2002), and carry out experiments outlining the potential of the approaches.
We present a new gradient method that uses scaling and extra updating within the diagonal updating for solving unconstrained optimization problem. The new method is in the frame of Barzilai … We present a new gradient method that uses scaling and extra updating within the diagonal updating for solving unconstrained optimization problem. The new method is in the frame of Barzilai and Borwein (BB) method, except that the Hessian matrix is approximated by a diagonal matrix rather than the multiple of identity matrix in the BB method. The main idea is to design a new diagonal updating scheme that incorporates scaling to instantly reduce the large eigenvalues of diagonal approximation and otherwise employs extra updates to increase small eigenvalues. These approaches give us a rapid control in the eigenvalues of the updating matrix and thus improve stepwise convergence. We show that our method is globally convergent. The effectiveness of the method is evaluated by means of numerical comparison with the BB method and its variant.
.Sequential residual methods try to solve nonlinear systems of equations \(F(x)=0\) by iteratively updating the current approximate solution along a residual-related direction. Therefore, memory requirements are minimal and, consequently, these … .Sequential residual methods try to solve nonlinear systems of equations \(F(x)=0\) by iteratively updating the current approximate solution along a residual-related direction. Therefore, memory requirements are minimal and, consequently, these methods are attractive for solving large-scale nonlinear systems. However, the convergence of these algorithms may be slow in critical cases; therefore, acceleration procedures are welcome. In this paper, we suggest employing a variation of the sequential secant method in order to accelerate sequential residual methods. The performance of the resulting algorithm is illustrated by applying it to the solution of very large problems coming from the discretization of partial differential equations.Keywordsnonlinear systems of equationssequential residual methodsaccelerationlarge-scale problemsMSC codes65H1065K0590C53