Multiplier and gradient methods

Type: Article
Publication Date: 1969-11-01
Citations: 2359
DOI: https://doi.org/10.1007/bf00927673

Locations

  • Journal of Optimization Theory and Applications
In this chapter, the authors discuss some typical iterative optimization algorithms and their properties. Mostly, the algorithms are based on the gradients of the cost functions. Therefore, vector and matrix … In this chapter, the authors discuss some typical iterative optimization algorithms and their properties. Mostly, the algorithms are based on the gradients of the cost functions. Therefore, vector and matrix gradients are reviewed first, followed by the most typical ways to solve unconstrained and constrained optimization problems with gradient-type learning algorithms.
A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content. A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content.
This paper summarizes an analysis of penalty and multiplier methods for constrained minimization, contained in Refs. [1-3]. We describe global convergence and rate of convergence results for multiplier methods, as … This paper summarizes an analysis of penalty and multiplier methods for constrained minimization, contained in Refs. [1-3]. We describe global convergence and rate of convergence results for multiplier methods, as well as a global duality theory for nonconvex problems. A main qualitative conclusion is that multiplier methods avoid to a substantial extent the traditional disadvantages of penalty methods (ill-conditioning, slow convergence) while retaining all of their attractive features.
Abstract The gradient method , which is also called the method of steepest descent , and the Cauchy method , is one of the most fundamental derivative‐based procedure for unconstrained … Abstract The gradient method , which is also called the method of steepest descent , and the Cauchy method , is one of the most fundamental derivative‐based procedure for unconstrained minimization of a differentiable function. The performance of the method in terms of speed of convergence is lacking, and it tends to suffer from very slow convergence, especially as a stationary point is approached. However, it does guarantee global convergence under reasonable conditions and admits a thorough mathematical analysis of its behavior. For this reason, the gradient method has been used as a starting point in the development of more sophisticated, globally convergent algorithms with better convergence properties for unconstrained minimization. This article presents a cogent overview of this fundamental method and its convergence properties under various settings.
4.1 ▪ Descent Directions Methods 4.1 ▪ Descent Directions Methods
4.1 ▪ Descent Directions Methods 4.1 ▪ Descent Directions Methods
In this section, we discuss fundamental methods, mostly based on gradient information, that yield descent, that is, the function value decreases at each iteration. We start with the most basic … In this section, we discuss fundamental methods, mostly based on gradient information, that yield descent, that is, the function value decreases at each iteration. We start with the most basic method, the steepest-descent method, analyzing its convergence under different convexity/nonconvexity assumptions on the objective function. We then discuss more general descent methods, based on descent directions other than the negative gradient, showing conditions on the search direction and the steplength that allow convergence results to be proved. We also discuss a method that also makes use of Hessian information, showing that it can find a point satisfying approximate second-order optimality conditions and finding an upper bound on the number of iterations required to do so. We then discuss mirror descent, a class of gradient methods based on more general distance metrics that are particularly useful in optimizing over the unit simplex – a problem that arises often in data science. We conclude by discussing the PL condition, a generalization of the strong convexity condition that allows linear convergence rates to be proved.
This thesis investigates and develops algorithms based on Lagrangian relaxation and normalized multiparametric disaggregation technique to solve nonconvex mixed-integer quadratically constrained quadratic programming.First, relaxations for quadratic programming and related problem … This thesis investigates and develops algorithms based on Lagrangian relaxation and normalized multiparametric disaggregation technique to solve nonconvex mixed-integer quadratically constrained quadratic programming.First, relaxations for quadratic programming and related problem classes are reviewed.Then, the normalized multiparametric disaggregation technique is improved to a reformulated version, in which the size of the generated subproblems are reduced in the number of binary variables.Furthermore, issues related to the use of the Lagrangian relaxation to solve nonconvex problems are addressed by replacing the dual subproblems with convex relaxations.This method is compared to commercial and open source off-the-shelf global solvers using randomly generated instances.The proposed method converged in 35 of 36 instances, while Baron, the benchmark solver that obtained the best results only converged in 4 of 36.Additionally, even for the one instance the methods did not converge, it achieved relative gaps below 1% in all instances, while Baron achieved relative gaps between 10% and 30% in most of them.
.This paper introduces two decomposition-based methods for two-block mixed-integer linear programs (MILPs), which aim to take advantage of separable structures of the original problem by solving a sequence of lower-dimensional … .This paper introduces two decomposition-based methods for two-block mixed-integer linear programs (MILPs), which aim to take advantage of separable structures of the original problem by solving a sequence of lower-dimensional MILPs. The first method is based on the \(\ell_1\)-augmented Lagrangian method, and the second one is based on a modified alternating direction method of multipliers. In the presence of certain block-angular structures, both methods create parallel subproblems in one block of variables and add nonconvex cuts to update the other block; they converge to globally optimal solutions of the original MILP under proper conditions. Numerical experiments on three classes of MILPs demonstrate the advantages of the proposed methods on structured problems over the state-of-the-art MILP solvers.Keywordsmixed-integer linear programsdecomposition methodaugmented Lagrangian methodalternating direction method of multipliersMSC codes49M2790C1190C26
In this paper, a multi-parameterized proximal point algorithm combining with a relaxation step is developed for solving convex minimization problem subject to linear constraints. We show its global convergence and … In this paper, a multi-parameterized proximal point algorithm combining with a relaxation step is developed for solving convex minimization problem subject to linear constraints. We show its global convergence and sublinear convergence rate from the prospective of variational inequality. Preliminary numerical experiments on testing a sparse minimization problem from signal processing indicate that the proposed algorithm performs better than some well-established methods.
It is well recognized the convenience of converting the linearly constrained convex optimization problems to a monotone variational inequality.Recently, we have proposed a unified algorithmic framework which can guide us … It is well recognized the convenience of converting the linearly constrained convex optimization problems to a monotone variational inequality.Recently, we have proposed a unified algorithmic framework which can guide us to construct the solution methods for solving these monotone variational inequalities.In this work, we revisit two full Jacobian decomposition of the augmented Lagrangian methods for separable convex programming which we have studied a few years ago.In particular, exploiting this framework, we are able to give a very clear and elementary proof of the convergence of these solution methods.
In this article, we consider the misspecified optimization problem of minimizing a convex function <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$f(x;\theta ^*)$</tex-math></inline-formula> in <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$x$</tex-math></inline-formula> over a conic constraint set represented … In this article, we consider the misspecified optimization problem of minimizing a convex function <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$f(x;\theta ^*)$</tex-math></inline-formula> in <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$x$</tex-math></inline-formula> over a conic constraint set represented by <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$h(x;\theta ^*) \in \mathcal {K}$</tex-math></inline-formula> , where <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\theta ^*$</tex-math></inline-formula> is an unknown (or misspecified) vector of parameters, <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\mathcal {K}$</tex-math></inline-formula> is a closed convex cone, and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$h$</tex-math></inline-formula> is affine in <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$x$</tex-math></inline-formula> . Suppose that <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\theta ^*$</tex-math></inline-formula> is unavailable but may be learnt by a separate process that generates a sequence of estimators <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\theta _k$</tex-math></inline-formula> , each of which is an increasingly accurate approximation of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\theta ^*$</tex-math></inline-formula> . We develop a first-order inexact augmented Lagrangian (AL) scheme for computing an optimal solution <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$x^*$</tex-math></inline-formula> corresponding to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\theta ^*$</tex-math></inline-formula> while simultaneously learning <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\theta ^*$</tex-math></inline-formula> . In particular, we derive rate statements for such schemes when the penalty parameter sequence is either constant or increasing and derive bounds on the overall complexity in terms of proximal gradient steps when AL subproblems are inexactly solved via an accelerated proximal gradient scheme. Numerical results for a portfolio optimization problem with a misspecified covariance matrix suggest that these schemes perform well in practice, while naive sequential schemes may perform poorly in comparison.
We present a smooth augmented Lagrangian algorithm for semiinfinite programming (SIP). For this algorithm, we establish a perturbation theorem under mild conditions. As a corollary of the perturbation theorem, we … We present a smooth augmented Lagrangian algorithm for semiinfinite programming (SIP). For this algorithm, we establish a perturbation theorem under mild conditions. As a corollary of the perturbation theorem, we obtain the global convergence result, that is, any accumulation point of the sequence generated by the algorithm is the solution of SIP. We get this global convergence result without any boundedness condition or coercive condition. Another corollary of the perturbation theorem shows that the perturbation function at zero point is lower semi‐continuous if and only if the algorithm forces the sequence of objective function convergence to the optimal value of SIP. Finally, numerical results are given.
Abstract The goal of this paper is to find a better method that converges faster of Max-Cut problem. One strategy is to the comparison between Bundle Method and the Augmented … Abstract The goal of this paper is to find a better method that converges faster of Max-Cut problem. One strategy is to the comparison between Bundle Method and the Augmented Lagrangian method. We have also developed the theoretical convergence properties of these methods.
We focus on a linearly constrained strongly convex minimization model and discuss the application of the classical Uzawa method. Our principal goal is to show that some existing acceleration schemes … We focus on a linearly constrained strongly convex minimization model and discuss the application of the classical Uzawa method. Our principal goal is to show that some existing acceleration schemes can be used to accelerate the Uzawa method in the sense that the worst-case convergence rate (measured by the iteration complexity) of the resulting accelerated Uzawa schemes is $O(1/k^2)$ where $k$ represents the iteration counter. Our discussion assumes that the objective function is given by a black-box oracle; thus an inexact version of the Uzawa method with a sequence of dynamically-chosen step sizes is implemented. A worst-case convergence rate of $O(1/k)$ is also shown for this inexact version. Some preliminary numerical results are reported to verify the acceleration effectiveness of the accelerated Uzawa schemes and their superiority over some existing methods.
The objective function of an optimization problem can be either unimodal or multimodal. These are also referred to as convex or non-convex problems. To solve a constrained optimization problem, two … The objective function of an optimization problem can be either unimodal or multimodal. These are also referred to as convex or non-convex problems. To solve a constrained optimization problem, two possible approaches can be considered: the direct approach and the indirect approach. The direct approach can prove very difficult in its implementation in situations where the admissible space is complex or difficult to circumscribe, especially when it is very small compared to the forbidden space. In the indirect approach, the problem of constrained optimization is first transformed into an unconstrained problem, and then solved by making use of any unconstrained optimization method. Numerical optimization methods rely on iterative algorithms, whose parameters usually require adjustment before achieving the desired performance. This step of adjusting parameters is akin to a learning step, which improves the accuracy of the algorithm according to the trial and error method.
It is well known that time-dependent Hamilton–Jacobi–Isaacs partial differential equations (HJ PDEs) play an important role in analyzing continuous dynamic games and control theory problems. An important tool for such … It is well known that time-dependent Hamilton–Jacobi–Isaacs partial differential equations (HJ PDEs) play an important role in analyzing continuous dynamic games and control theory problems. An important tool for such problems when they involve geometric motion is the level set method (Osher and Sethian in J Comput Phys 79(1):12–49, 1988). This was first used for reachability problems in Mitchell et al. (IEEE Trans Autom Control 50(171):947–957, 2005) and Mitchell and Tomlin (J Sci Comput 19(1–3):323–346, 2003). The cost of these algorithms and, in fact, all PDE numerical approximations is exponential in the space dimension and time. In Darbon (SIAM J Imaging Sci 8(4):2268–2293, 2015), some connections between HJ PDE and convex optimization in many dimensions are presented. In this work, we propose and test methods for solving a large class of the HJ PDE relevant to optimal control problems without the use of grids or numerical approximations. Rather we use the classical Hopf formulas for solving initial value problems for HJ PDE (Hopf in J Math Mech 14:951–973, 1965). We have noticed that if the Hamiltonian is convex and positively homogeneous of degree one (which the latter is for all geometrically based level set motion and control and differential game problems) that very fast methods exist to solve the resulting optimization problem. This is very much related to fast methods for solving problems in compressive sensing, based on $$\ell _1$$ optimization (Goldstein and Osher in SIAM J Imaging Sci 2(2):323–343, 2009; Yin et al. in SIAM J Imaging Sci 1(1):143–168, 2008). We seem to obtain methods which are polynomial in the dimension. Our algorithm is very fast, requires very low memory and is totally parallelizable. We can evaluate the solution and its gradient in very high dimensions at $$10^{-4}$$ – $$10^{-8}$$ s per evaluation on a laptop. We carefully explain how to compute numerically the optimal control from the numerical solution of the associated initial valued HJ PDE for a class of optimal control problems. We show that our algorithms compute all the quantities we need to obtain easily the controller. In addition, as a step often needed in this procedure, we have developed a new and equally fast way to find, in very high dimensions, the closest point y lying in the union of a finite number of compact convex sets $$\Omega $$ to any point x exterior to the $$\Omega $$ . We can also compute the distance to these sets much faster than Dijkstra type "fast methods," e.g., Dijkstra (Numer Math 1:269–271, 1959). The term "curse of dimensionality" was coined by Bellman (Adaptive control processes, a guided tour. Princeton University Press, Princeton, 1961; Dynamic programming. Princeton University Press, Princeton, 1957), when considering problems in dynamic optimization.
.We propose a double-loop inexact accelerated proximal gradient (APG) method for a strongly convex composite optimization problem with two smooth components of different smoothness constants and computational costs. Compared to … .We propose a double-loop inexact accelerated proximal gradient (APG) method for a strongly convex composite optimization problem with two smooth components of different smoothness constants and computational costs. Compared to APG, the inexact APG can reduce the time complexity for finding a near-stationary point when one smooth component has higher computational cost but a smaller smoothness constant than the other. The strongly convex composite optimization problem with this property arises from subproblems of a regularized augmented Lagrangian method for affine-constrained composite convex optimization and also from the smooth approximation for bilinear saddle-point structured nonsmooth convex optimization. We show that the inexact APG method can be applied to these two problems and reduce the time complexity for finding a near-stationary solution. Numerical experiments demonstrate significantly higher efficiency of our methods over an optimal primal-dual first-order method by Hamedani and Aybat [SIAM J. Optim., 31 (2021), pp. 1299–1329] and the gradient sliding method by Lan, Ouyang, and Zhou [arXiv2101.00143, 2021].Keywordsfirst-order methodconstrained optimizationsaddle-point nonsmooth optimizationMSC codes65K0568Q2590C3090C60
Summary This article extends the analysis of optimal control based on a generalized Hamiltonian which covers situations of nonconvexity. The approach offers several key advantages. First, by identifying a global … Summary This article extends the analysis of optimal control based on a generalized Hamiltonian which covers situations of nonconvexity. The approach offers several key advantages. First, by identifying a global solution to a constrained optimization problem, the generalized Hamiltonian approach solves the problem of distinguishing between a global optimum and the (possibly many) nonoptimal points satisfying the Pontryagyn principle under nonconvexity. Second, in our generalized approach, interpreting the slopes of the separating hypersurface as shadow prices of the states continues to hold. Third, we discuss how the generalized Hamiltonian approach can be used in solving dynamic optimization problems under nonconvexity.
We present in this paper new results on the existence of saddle points of augmented Lagrangian functions for constrained nonconvex optimization. Four classes of augmented Lagrangian functions are considered: the … We present in this paper new results on the existence of saddle points of augmented Lagrangian functions for constrained nonconvex optimization. Four classes of augmented Lagrangian functions are considered: the essentially quadratic augmented Lagrangian, the exponential-type augmented Lagrangian, the modified barrier augmented Lagrangian, and the penalized exponential-type augmented Lagrangian. We first show that under second-order sufficiency conditions, all these augmented Lagrangian functions possess local saddle points. We then prove that global saddle points of these augmented Lagrangian functions exist under certain mild additional conditions. The results obtained in this paper provide a theoretical foundation for the use of augmented Lagrangians in constrained global optimization. Our findings also give new insights to the role played by augmented Lagrangians in local duality theory of constrained nonconvex optimization.
Seismic reflection tomography is a method for determining a subsurface velocity model from the traveltimes of seismic waves reflecting on geological interfaces. From an optimization viewpoint, the problem consists in … Seismic reflection tomography is a method for determining a subsurface velocity model from the traveltimes of seismic waves reflecting on geological interfaces. From an optimization viewpoint, the problem consists in minimizing a non-linear least-squares function measuring the mismatch between observed traveltimes and those calculated by ray tracing in this model. The introduction of a priori information on the model is crucial to reduce the under-determination. The contribution of this paper is to introduce a technique able to take into account geological a priori information in the reflection tomography problem expressed as inequality constraints in the optimization problem. This technique is based on a Gauss-Newton (GN) sequential quadratic programming approach. At each GN step, a solution to a convex quadratic optimization problem subject to linear constraints is computed thanks to an augmented Lagrangian algorithm. Our choice for this optimization method is motivated and its original aspects are described. First applications on real data sets are presented to illustrate the potential of the approach in practical use of reflection tomography.
Abstract In this work, in the context of Linear and convex Quadratic Programming, we consider Primal Dual Regularized Interior Point Methods (PDR-IPMs) in the framework of the Proximal Point Method. … Abstract In this work, in the context of Linear and convex Quadratic Programming, we consider Primal Dual Regularized Interior Point Methods (PDR-IPMs) in the framework of the Proximal Point Method. The resulting Proximal Stabilized IPM (PS-IPM) is strongly supported by theoretical results concerning convergence and the rate of convergence, and can handle degenerate problems. Moreover, in the second part of this work, we analyse the interactions between the regularization parameters and the computational footprint of the linear algebra routines used to solve the Newton linear systems. In particular, when these systems are solved using an iterative Krylov method, we are able to show—using a new rearrangement of the Schur complement which exploits regularization—that general purposes preconditioners remain attractive for a series of subsequent IPM iterations. Indeed, if on the one hand a series of theoretical results underpin the fact that the approach here presented allows a better re-use of such computed preconditioners, on the other, we show experimentally that such (re)computations are needed only in a fraction of the total IPM iterations. The resulting regularized second order methods, for which low-frequency-update of the preconditioners are allowed, pave the path for an alternative class of second order methods characterized by reduced computational effort.
This paper proposes an accelerated proximal point method for maximally monotone operators. The proof is computer-assisted via the performance estimation problem approach. The proximal point method includes various well-known convex … This paper proposes an accelerated proximal point method for maximally monotone operators. The proof is computer-assisted via the performance estimation problem approach. The proximal point method includes various well-known convex optimization methods, such as the proximal method of multipliers and the alternating direction method of multipliers, and thus the proposed acceleration has wide applications. Numerical experiments are presented to demonstrate the accelerating behaviors.
In this paper we discuss a specialization of the augmented Lagrangian-type algorithm of Conn, Gould, and Toint to the solution of strictly convex quadratic programming problems with simple bounds and … In this paper we discuss a specialization of the augmented Lagrangian-type algorithm of Conn, Gould, and Toint to the solution of strictly convex quadratic programming problems with simple bounds and equality constraints. The new feature of the presented algorithm is the adaptive precision control of the solution of auxiliary problems in the inner loop of the basic algorithm which yields a rate of convergence that does not have any term that accounts for inexact solution of auxiliary problems. Moreover, boundedness of the penalty parameter is achieved for the precision control used. Numerical experiments illustrate the efficiency of the presented algorithm and encourage its usage.
We consider a class of sparse learning problems in high dimensional feature space regularized by a structured sparsity-inducing norm which incorporates prior knowledge of the group structure of the features. … We consider a class of sparse learning problems in high dimensional feature space regularized by a structured sparsity-inducing norm which incorporates prior knowledge of the group structure of the features. Such problems often pose a considerable challenge to optimization algorithms due to the non-smoothness and non-separability of the regularization term. In this paper, we focus on two commonly adopted sparsity-inducing regularization terms, the overlapping Group Lasso penalty $l_1/l_2$-norm and the $l_1/l_\infty$-norm. We propose a unified framework based on the augmented Lagrangian method, under which problems with both types of regularization and their variants can be efficiently solved. As the core building-block of this framework, we develop new algorithms using an alternating partial-linearization/splitting technique, and we prove that the accelerated versions of these algorithms require $O(\frac{1}{\sqrt{\epsilon}})$ iterations to obtain an $\epsilon$-optimal solution. To demonstrate the efficiency and relevance of our algorithms, we test them on a collection of data sets and apply them to two real-world problems to compare the relative merits of the two norms.
Nonlinear optimization problems for reliability of a complex system are solved using the generalized Lagrangian function (GLF) method and the generalized reduced gradient (GRG) method. GLF is twice continuously differentiable … Nonlinear optimization problems for reliability of a complex system are solved using the generalized Lagrangian function (GLF) method and the generalized reduced gradient (GRG) method. GLF is twice continuously differentiable and closely related to the generalized penalty function which includes the interior and exterior penalty functions as a special case. GRG generalizes the Wolfe reduced gradient method and has been coded in FORTRAN title ``GREG'' by Abadie et al. Two system reliability optimization problems are solved. The first maximizes complex-system reliability with a tangent cost-function; the second minimizes the cost, with a minimum system reliability. The results are compared with those using the Sequential Unconstrained Minimization Technique (SUMT) and the direct search approach by Luus and Jaakola (LJ). Many algorithms have been proposed for solving the general nonlinear programming problem. Only a few have been demonstrated to be effective when applied to large-scale nonlinear programming problems, and none has proved to be so superior that it can be classified as a universal algorithm. Both GLF and GRG methods presented here have been successfully used in solving a number of general nonlinear programming problems in a variety of engineering applications and are better methods among the many algorithms.
We propose a practical inexact augmented Lagrangian method (iALM) for nonconvex problems with nonlinear constraints. We characterize the total computational complexity of our method subject to a verifiable geometric condition, … We propose a practical inexact augmented Lagrangian method (iALM) for nonconvex problems with nonlinear constraints. We characterize the total computational complexity of our method subject to a verifiable geometric condition, which is closely related to the Polyak-Lojasiewicz and Mangasarian-Fromovitz conditions. In particular, when a first-order solver is used for the inner iterates, we prove that iALM finds a first-order stationary point with $\tilde{\mathcal{O}}(1/\epsilon^3)$ calls to the first-order oracle. If, in addition, the problem is smooth and a second-order solver is used for the inner iterates, iALM finds a second-order stationary point with $\tilde{\mathcal{O}}(1/\epsilon^5)$ calls to the second-order oracle. These complexity results match the known theoretical results in the literature. We also provide strong numerical evidence on large-scale machine learning problems, including the Burer-Monteiro factorization of semidefinite programs, and a novel nonconvex relaxation of the standard basis pursuit template. For these examples, we also show how to verify our geometric condition.
This work proposes an accelerated primal-dual dynamical system for affine constrained convex optimization and presents a class of primal-dual methods with nonergodic convergence rates. In continuous level, exponential decay of … This work proposes an accelerated primal-dual dynamical system for affine constrained convex optimization and presents a class of primal-dual methods with nonergodic convergence rates. In continuous level, exponential decay of a novel Lyapunov function is established and in discrete level, implicit, semi-implicit and explicit numerical discretizations for the continuous model are considered sequentially and lead to new accelerated primal-dual methods for solving linearly constrained optimization problems. Special structures of the subproblems in those schemes are utilized to develop efficient inner solvers. In addition, nonergodic convergence rates in terms of primal-dual gap, primal objective residual and feasibility violation are proved via a tailored discrete Lyapunov function. Moreover, our method has also been applied to decentralized distributed optimization for fast and efficient solution.
In this paper, we propose a new decomposition approach named the proximal primal dual algorithm (Prox-PDA) for smooth nonconvex linearly constrained optimization problems. The proposed approach is primal-dual based, where … In this paper, we propose a new decomposition approach named the proximal primal dual algorithm (Prox-PDA) for smooth nonconvex linearly constrained optimization problems. The proposed approach is primal-dual based, where the primal step minimizes certain approximation of the augmented Lagrangian of the problem, and the dual step performs an approximate dual ascent. The approximation used in the primal step is able to decompose the variable blocks, making it possible to obtain simple subproblems by leveraging the problem structures. Theoretically, we show that whenever the penalty parameter in the augmented Lagrangian is larger than a given threshold, the Prox-PDA converges to the set of stationary solutions, globally and in a sublinear manner (i.e., certain measure of stationarity decreases in the rate of $\mathcal{O}(1/r)$, where $r$ is the iteration counter). Interestingly, when applying a variant of the Prox-PDA to the problem of distributed nonconvex optimization (over a connected undirected graph), the resulting algorithm coincides with the popular EXTRA algorithm [Shi et al 2014], which is only known to work in convex cases. Our analysis implies that EXTRA and its variants converge globally sublinearly to stationary solutions of certain nonconvex distributed optimization problem. There are many possible extensions of the Prox-PDA, and we present one particular extension to certain nonconvex distributed matrix factorization problem.
In this paper we study the worst-case complexity of an inexact Augmented Lagrangian method for nonconvex constrained problems. Assuming that the penalty parameters are bounded, we prove a complexity bound … In this paper we study the worst-case complexity of an inexact Augmented Lagrangian method for nonconvex constrained problems. Assuming that the penalty parameters are bounded, we prove a complexity bound of $\mathcal{O}(|\log(\epsilon)|)$ outer iterations for the referred algorithm to generate an $\epsilon$-approximate KKT point, for $\epsilon\in (0,1)$. When the penalty parameters are unbounded, we prove an outer iteration complexity bound of $\mathcal{O}\left(\epsilon^{-2/(\alpha-1)}\right)$, where $\alpha>1$ controls the rate of increase of the penalty parameters. For linearly constrained problems, these bounds yield to evaluation complexity bounds of $\mathcal{O}(|\log(\epsilon)|^{2}\epsilon^{-2})$ and $\mathcal{O}\left(\epsilon^{-\left(\frac{2(2+\alpha)}{\alpha-1}+2\right)}\right)$, respectively, when appropriate first-order methods ($p=1$) are used to approximately solve the unconstrained subproblems at each iteration. In the case of problems having only linear equality constraints, the latter bounds are improved to $\mathcal{O}(|\log(\epsilon)|^{2}\epsilon^{-(p+1)/p})$ and $\mathcal{O}\left(\epsilon^{-\left(\frac{4}{\alpha-1}+\frac{p+1}{p}\right)}\right)$, respectively, when appropriate $p$-order methods ($p\geq 2$) are used as inner solvers.
In this paper, we analyze the convergence of Alternating Direction Method of Multipliers (ADMM) on convex quadratic programs (QPs) with linear equality and bound constraints. The ADMM formulation alternates between … In this paper, we analyze the convergence of Alternating Direction Method of Multipliers (ADMM) on convex quadratic programs (QPs) with linear equality and bound constraints. The ADMM formulation alternates between an equality constrained QP and a projection on the bounds. Under the assumptions of: (i) positive definiteness of the Hessian of the objective projected on the null space of equality constraints (reduced Hessian), and (ii) linear independence constraint qualification holding at the optimal solution we derive an upper bound on the rate of convergence to the solution at each iteration. In particular, we provide an explicit characterization of the rate of convergence in terms of: (a) the eigenvalues of the reduced Hessian, (b) the cosine of the Friedrichs angle between the subspace spanned by equality constraints and the subspace spanned by the gradients of the components that are active at the solution and (c) the distance of the inactive components of solution from the bounds. Using this analysis we show that if the QP is feasible, the iterates converge at a Q-linear rate and prescribe an optimal setting for the ADMM step-size parameter. For infeasible QPs, we show that the primal variables in ADMM converge to minimizers of the Euclidean distance between the hyperplane defined by the equality constraints and the convex set defined by the bounds. The multipliers for the bound constraints are shown to diverge along the range space of the equality constraints. Using this characterization, we also propose a termination criterion for ADMM. Numerical examples are provided to illustrate the theory through experiments.
When designing controllers with robust performance and stabilization requirements, H-infinity synthesis is a common tool to use. These controllers are often obtained by solving mathematical optimiz ... When designing controllers with robust performance and stabilization requirements, H-infinity synthesis is a common tool to use. These controllers are often obtained by solving mathematical optimiz ...
The proof of these remarks can be found in the author's paper [4] on the Weierstrass E-function.This paper deals with the parametric problem.However the hypotheses are such that the results … The proof of these remarks can be found in the author's paper [4] on the Weierstrass E-function.This paper deals with the parametric problem.However the hypotheses are such that the results are equally applicable to the nonparametric case.
A powerful iterative descent method for finding a local minimum of a function of several variables is described. A number of theorems are proved to show that it always converges … A powerful iterative descent method for finding a local minimum of a function of several variables is described. A number of theorems are proved to show that it always converges and that it converges rapidly. Numerical tests on a variety of functions confirm these theorems. The method has been used to solve a system of one hundred non-linear simultaneous equations.
An iterative algorithm is given for solving a system Ax=k of n linear equations in n unknowns. The solution is given in n steps. It is shown that this method … An iterative algorithm is given for solving a system Ax=k of n linear equations in n unknowns. The solution is given in n steps. It is shown that this method is a special case of a very general method which also includes Gaussian elimination. These general algorithms are essentially algorithms for finding an n dimensional ellipsoid. Connections are made with the theory of orthogonal polynomials and continued fractions.