This paper concerns the composite problem of minimizing the sum of a twice continuously differentiable function $f$ and a nonsmooth convex function. For this class of nonconvex and nonsmooth problems, ā¦
This paper concerns the composite problem of minimizing the sum of a twice continuously differentiable function $f$ and a nonsmooth convex function. For this class of nonconvex and nonsmooth problems, by leveraging a practical inexactness criterion and a novel selection strategy for iterates, we propose an inexact $q\in[2,3]$-order regularized proximal Newton method, which becomes an inexact cubic regularization (CR) method for $q=3$. We justify that its iterate sequence converges to a stationary point for the KL objective function, and if the objective function has the KL property of exponent $\theta\in(0,\frac{q-1}{q})$, the convergence has a local $Q$-superlinear rate of order $\frac{q-1}{\theta q}$. In particular, under a locally H\"{o}lderian error bound of order $\gamma\in(\frac{1}{q-1},1]$ on a second-order stationary point set, the iterate sequence converges to a second-order stationary point with a local $Q$-superlinear rate of order $\gamma(q\!-\!1)$, which is specified as $Q$-quadratic rate for $q=3$ and $\gamma=1$. This is the first practical inexact CR method with $Q$-quadratic convergence rate for nonconvex composite optimization. We validate the efficiency of the proposed method with ZeroFPR as the solver of subproblems by applying it to convex and nonconvex composite problems with a highly nonlinear $f$.
In this paper, we propose an inexact proximal Newton-type method for nonconvex composite problems. We establish the global convergence rate of the order \(\mathcal{O}(k^{-1/2})\) in terms of the minimal norm ā¦
In this paper, we propose an inexact proximal Newton-type method for nonconvex composite problems. We establish the global convergence rate of the order \(\mathcal{O}(k^{-1/2})\) in terms of the minimal norm of the KKT residual mapping and the local superlinear convergence rate in terms of the sequence generated by the proposed algorithm under the higher-order metric \(q\)-subregularity property. When the Lipschitz constant of the corresponding gradient is known, we show that the proposed algorithm is well-defined without line search. Extensive numerical experiments on {the \(\ell_1\)-regularized Student's \(t\)-regression and the group penalized Student's \(t\)-regression} show that the performance of the proposed method is comparable to the state-of-the-art proximal Newton-type methods.
This paper focuses on the minimization of a sum of a twice continuously differentiable function $f$ and a nonsmooth convex function. An inexact regularized proximal Newton method is proposed by ā¦
This paper focuses on the minimization of a sum of a twice continuously differentiable function $f$ and a nonsmooth convex function. An inexact regularized proximal Newton method is proposed by an approximation to the Hessian of $f$ involving the $\varrho$th power of the KKT residual. For $\varrho=0$, we justify the global convergence of the iterate sequence for the KL objective function and its R-linear convergence rate for the KL objective function of exponent $1/2$. For $\varrho\in(0,1)$, by assuming that cluster points satisfy a locally H\"{o}lderian error bound of order $q$ on a second-order stationary point set and a local error bound of order $q>1\!+\!\varrho$ on the common stationary point set, respectively, we establish the global convergence of the iterate sequence and its superlinear convergence rate with order depending on $q$ and $\varrho$. A dual semismooth Newton augmented Lagrangian method is also developed for seeking an inexact minimizer of subproblems. Numerical comparisons with two state-of-the-art methods on $\ell_1$-regularized Student's $t$-regressions, group penalized Student's $t$-regressions, and nonconvex image restoration confirm the efficiency of the proposed method.
Abstract This paper focuses on the minimization of a sum of a twice continuously differentiable function f and a nonsmooth convex function. An inexact regularized proximal Newton method is proposed ā¦
Abstract This paper focuses on the minimization of a sum of a twice continuously differentiable function f and a nonsmooth convex function. An inexact regularized proximal Newton method is proposed by an approximation to the Hessian of f involving the $$\varrho $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>ϱ</mml:mi> </mml:math> th power of the KKT residual. For $$\varrho =0$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>ϱ</mml:mi> <mml:mo>=</mml:mo> <mml:mn>0</mml:mn> </mml:mrow> </mml:math> , we justify the global convergence of the iterate sequence for the KL objective function and its R-linear convergence rate for the KL objective function of exponent 1/2. For $$\varrho \in (0,1)$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>ϱ</mml:mi> <mml:mo>ā</mml:mo> <mml:mo>(</mml:mo> <mml:mn>0</mml:mn> <mml:mo>,</mml:mo> <mml:mn>1</mml:mn> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> , by assuming that cluster points satisfy a locally Hƶlderian error bound of order q on a second-order stationary point set and a local error bound of order $$q>1\!+\!\varrho $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>q</mml:mi> <mml:mo>></mml:mo> <mml:mn>1</mml:mn> <mml:mspace/> <mml:mo>+</mml:mo> <mml:mspace/> <mml:mi>ϱ</mml:mi> </mml:mrow> </mml:math> on the common stationary point set, respectively, we establish the global convergence of the iterate sequence and its superlinear convergence rate with order depending on q and $$\varrho $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>ϱ</mml:mi> </mml:math> . A dual semismooth Newton augmented Lagrangian method is also developed for seeking an inexact minimizer of subproblems. Numerical comparisons with two state-of-the-art methods on $$\ell _1$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:msub> <mml:mi>ā</mml:mi> <mml:mn>1</mml:mn> </mml:msub> </mml:math> -regularized Studentās t -regressions, group penalized Studentās t -regressions, and nonconvex image restoration confirm the efficiency of the proposed method.
Optimization problems with composite functions consist of an objective function which is the sum of a smooth and a (convex) nonsmooth term. This particular structure is exploited by the class ā¦
Optimization problems with composite functions consist of an objective function which is the sum of a smooth and a (convex) nonsmooth term. This particular structure is exploited by the class of proximal gradient methods and some of their generalizations like proximal Newton and quasi-Newton methods. In this paper, we propose a regularized proximal quasi-Newton method whose main features are: (a) the method is globally convergent to stationary points, (b) the globalization is controlled by a regularization parameter, no line search is required, (c) the method can be implemented very efficiently based on a simple observation which combines recent ideas for the computation of quasi-Newton proximity operators and compact representations of limited-memory quasi-Newton updates. Numerical examples for the solution of convex and nonconvex composite optimization problems indicate that the method outperforms several existing methods.
Abstract Optimization problems with composite functions consist of an objective function which is the sum of a smooth and a (convex) nonsmooth term. This particular structure is exploited by the ā¦
Abstract Optimization problems with composite functions consist of an objective function which is the sum of a smooth and a (convex) nonsmooth term. This particular structure is exploited by the class of proximal gradient methods and some of their generalizations like proximal Newton and quasi-Newton methods. The current literature on these classes of methods almost exclusively considers the case where also the smooth term is convex. Here we present a globalized proximal Newton-type method which allows the smooth term to be nonconvex. The method is shown to have nice global and local convergence properties, and some numerical results indicate that this method is very promising also from a practical point of view.
This paper describes and establishes the iteration-complexity of a doubly accelerated inexact proximal point (D-AIPP) method for solving the nonconvex composite minimization problem whose objective function is of the form ā¦
This paper describes and establishes the iteration-complexity of a doubly accelerated inexact proximal point (D-AIPP) method for solving the nonconvex composite minimization problem whose objective function is of the form $f+h$ where $f$ is a (possibly nonconvex) differentiable function whose gradient is Lipschitz continuous and $h$ is a closed convex function with bounded domain. D-AIPP performs two types of iterations, namely, inner and outer ones. Its outer iterations correspond to the ones of the accelerated inexact proximal point scheme. Its inner iterations are the ones performed by an accelerated composite gradient method for inexactly solving the convex proximal subproblems generated during the outer iterations. Thus, D-AIPP employs both inner and outer accelerations.
This work presents an adaptive superfast proximal augmented Lagrangian (AS-PAL) method for solving linearly-constrained smooth nonconvex composite optimization problems. Each iteration of AS-PAL inexactly solves a possibly nonconvex proximal augmented ā¦
This work presents an adaptive superfast proximal augmented Lagrangian (AS-PAL) method for solving linearly-constrained smooth nonconvex composite optimization problems. Each iteration of AS-PAL inexactly solves a possibly nonconvex proximal augmented Lagrangian (AL) subproblem obtained by an aggressive/adaptive choice of prox stepsize with the aim of substantially improving its computational performance followed by a full Lagrangian multiplier update. A major advantage of AS-PAL compared to other AL methods is that it requires no knowledge of parameters (e.g., size of constraint matrix, objective function curvatures, etc) associated with the optimization problem, due to its adaptive nature not only in choosing the prox stepsize but also in using a crucial adaptive accelerated composite gradient variant to solve the proximal AL subproblems. The speed and efficiency of AS-PAL is demonstrated through extensive computational experiments showing that it can solve many instances more than ten times faster than other state-of-the-art penalty and AL methods, particularly when high accuracy is required.
In this paper, we consider the composite optimization problem, where the objective function integrates a continuously differentiable loss function with a nonsmooth regularization term. Moreover, only the function values for ā¦
In this paper, we consider the composite optimization problem, where the objective function integrates a continuously differentiable loss function with a nonsmooth regularization term. Moreover, only the function values for the differentiable part of the objective function are available. To efficiently solve this composite optimization problem, we propose a preconditioned zeroth-order proximal gradient method in which the gradients and preconditioners are estimated by finite-difference schemes based on the function values at the same trial points. We establish the global convergence and worst-case complexity for our proposed method. Numerical experiments exhibit the superiority of our developed method.
This paper proposes an efficient adaptive variant of a quadratic penalty accelerated inexact proximal point (QP-AIPP) method proposed earlier by the authors. Both the QP-AIPP method and its variant solve ā¦
This paper proposes an efficient adaptive variant of a quadratic penalty accelerated inexact proximal point (QP-AIPP) method proposed earlier by the authors. Both the QP-AIPP method and its variant solve linearly set constrained nonconvex composite optimization problems using a quadratic penalty approach where the generated penalized subproblems are solved by a variant of the underlying AIPP method. The variant, in turn, solves a given penalized subproblem by generating a sequence of proximal subproblems which are then solved by an accelerated composite gradient algorithm. The main difference between AIPP and its variant is that the proximal subproblems in the former are always convex while the ones in the latter are not necessarily convex due to the fact that their prox parameters are chosen as aggressively as possible so as to improve efficiency. The possibly nonconvex proximal subproblems generated by the AIPP variant are also tentatively solved by a novel adaptive accelerated composite gradient algorithm based on the validity of some key convergence inequalities. As a result, the variant generates a sequence of proximal subproblems where the stepsizes are adaptively changed according to the responses obtained from the calls to the accelerated composite gradient algorithm. Finally, numerical results are given to demonstrate the efficiency of the proposed AIPP and QP-AIPP variants.
We propose an inexact proximal augmented Lagrangian framework with explicit inner problem termination rule for composite convex optimization problems. We consider arbitrary linearly convergent inner solver including in particular stochastic ā¦
We propose an inexact proximal augmented Lagrangian framework with explicit inner problem termination rule for composite convex optimization problems. We consider arbitrary linearly convergent inner solver including in particular stochastic algorithms, making the resulting framework more scalable facing the ever-increasing problem dimension. Each subproblem is solved inexactly with an explicit and self-adaptive stopping criterion, without requiring to set an a priori target accuracy. When the primal and dual domain are bounded, our method achieves $O(1/\sqrt{\epsilon})$ and $O(1/{\epsilon})$ complexity bound in terms of number of inner solver iterations, respectively for the strongly convex and non-strongly convex case. Without the boundedness assumption, only logarithm terms need to be added and the above two complexity bounds increase respectively to $\tilde O(1/\sqrt{\epsilon})$ and $\tilde O(1/{\epsilon})$, which hold both for obtaining $\epsilon$-optimal and $\epsilon$-KKT solution. Within the general framework that we propose, we also obtain $\tilde O(1/{\epsilon})$ and $\tilde O(1/{\epsilon^2})$ complexity bounds under relative smoothness assumption on the differentiable component of the objective function. We show through theoretical analysis as well as numerical experiments the computational speedup possibly achieved by the use of randomized inner solvers for large-scale problems.
Discover New Methods for Dealing with High-Dimensional Data A sparse statistical model has only a small number of nonzero parameters or weights; therefore, it is much easier to estimate and ā¦
Discover New Methods for Dealing with High-Dimensional Data A sparse statistical model has only a small number of nonzero parameters or weights; therefore, it is much easier to estimate and interpret than a dense model. Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underlying signal in a set of data. Top experts in this rapidly evolving field, the authors describe the lasso for linear regression and a simple coordinate descent algorithm for its computation. They discuss the application of 1 penalties to generalized linear models and support vector machines, cover generalized penalties such as the elastic net and group lasso, and review numerical methods for optimization. They also present statistical inference methods for fitted (lasso) models, including the bootstrap, Bayesian methods, and recently developed approaches. In addition, the book examines matrix decomposition, sparse multivariate analysis, graphical models, and compressed sensing. It concludes with a survey of theoretical results for the lasso. In this age of big data, the number of features measured on a person or object can be large and might be larger than the number of observations. This book shows how the sparsity assumption allows us to tackle these problems and extract useful and reproducible patterns from big datasets. Data analysts, computer scientists, and theorists will appreciate this thorough and up-to-date treatment of sparse statistical modeling.
Due to their property of enhancing the sparsity of solutions, $l_1$-regularized optimization problems have developed into a highly dynamic research area with a wide range of applications. We present a ā¦
Due to their property of enhancing the sparsity of solutions, $l_1$-regularized optimization problems have developed into a highly dynamic research area with a wide range of applications. We present a class of methods for $l_1$-regularized optimization problems that are based on a combination of semismooth Newton steps, a filter globalization, and shrinkage/thresholding steps. A multidimensional filter framework is used to control the acceptance and to evaluate the quality of the semismooth Newton steps. If the current Newton iterate is rejected a shrinkage/thresholding-based step with quasi-Armijo stepsize rule is used instead. Global convergence and transition to local q-superlinear convergence for both convex and nonconvex objective functions are established. We present numerical results and comparisons with several state-of-the-art methods that show the efficiency and competitiveness of the proposed method.
A new type of method for unconstrained optimization, called a tensor method, is introduced. It is related in its basic philosophy to the tensor methods for nonlinear equations for Schnabel ā¦
A new type of method for unconstrained optimization, called a tensor method, is introduced. It is related in its basic philosophy to the tensor methods for nonlinear equations for Schnabel and Frank [SIAM J. Numer. Anal., 21 (1984), pp. 815ā843], but beyond that the methods have significant differences. The tensor method for unconstrained optimization bases each iteration upon a fourth order model of the objective function. This model consists of the quadratic portion of the Taylor series, plus low-rank third and fourth order terms that cause the model to interpolate already calculated function and gradient values from one or more previous iterates. This paper also shows that the costs of forming, storing, and solving the tensor model are not significantly more than these costs for a standard method based upon a quadratic Taylor series model. Test results are presented for sets of problems where the Hessian at the minimizer is nonsingular, and where it is singular. On all the test sets, the tensor method solves considerably more problems than a comparable standard method. On problems solved by both methods, the tensor method requires about half as many iterations, and half as many function and derivative evaluations as the standard method, on the average.
We derive two-point step sizes for the steepest-descent method by approximating the secant equation. At the cost of storage of an extra iterate and gradient, these algorithms achieve better performance ā¦
We derive two-point step sizes for the steepest-descent method by approximating the secant equation. At the cost of storage of an extra iterate and gradient, these algorithms achieve better performance and cheaper computation than the classical steepest-descent method. We indicate a convergence analysis of the method in the two-dimensional quadratic case. The behaviour is highly remarkable and the analysis entirely nonstandard.
This paper is mainly devoted to the study and applications of Hƶlder metric subregularity (or metric $q$-subregularity of order $q\in(0,1]$) for general set-valued mappings between infinite-dimensional spaces. Employing advanced techniques ā¦
This paper is mainly devoted to the study and applications of Hƶlder metric subregularity (or metric $q$-subregularity of order $q\in(0,1]$) for general set-valued mappings between infinite-dimensional spaces. Employing advanced techniques of variational analysis and generalized differentiation, we derive neighborhood and point-based sufficient conditions as well as necessary conditions for $q$-metric subregularity with evaluating the exact subregularity bound, which are new even for the conventional (first-order) metric subregularity in both finite and infinite dimensions. In this way we also obtain new fractional error bound results for composite polynomial systems with explicit calculating fractional exponents. Finally, metric $q$-subregularity is applied to conduct a quantitative convergence analysis of the classical proximal point method (PPM) for finding zeros of maximal monotone operators on Hilbert spaces.
We study the convergence properties of an alternating proximal minimization algorithm for nonconvex structured functions of the type: L(x,y)=f(x)+Q(x,y)+g(y), where f and g are proper lower semicontinuous functions, defined on ā¦
We study the convergence properties of an alternating proximal minimization algorithm for nonconvex structured functions of the type: L(x,y)=f(x)+Q(x,y)+g(y), where f and g are proper lower semicontinuous functions, defined on Euclidean spaces, and Q is a smooth function that couples the variables x and y. The algorithm can be viewed as a proximal regularization of the usual Gauss-Seidel method to minimize L. We work in a nonconvex setting, just assuming that the function L satisfies the Kurdyka-Åojasiewicz inequality. An entire section illustrates the relevancy of such an assumption by giving examples ranging from semialgebraic geometry to āmetrically regularā problems. Our main result can be stated as follows: If L has the Kurdyka-Åojasiewicz property, then each bounded sequence generated by the algorithm converges to a critical point of L. This result is completed by the study of the convergence rate of the algorithm, which depends on the geometrical properties of the function L around its critical points. When specialized to [Formula: see text] and to f, g indicator functions, the algorithm is an alternating projection mehod (a variant of von Neumann's) that converges for a wide class of sets including semialgebraic and tame sets, transverse smooth manifolds or sets with āregularā intersection. To illustrate our results with concrete problems, we provide a convergent proximal reweighted ā 1 algorithm for compressive sensing and an application to rank reduction problems.
We consider a variable metric linesearch based proximal gradient method for the minimization of the sum of a smooth, possibly nonconvex function plus a convex, possibly nonsmooth term. We prove ā¦
We consider a variable metric linesearch based proximal gradient method for the minimization of the sum of a smooth, possibly nonconvex function plus a convex, possibly nonsmooth term. We prove convergence of this iterative algorithm to a critical point if the objective function satisfies the Kurdyka-Lojasiewicz property at each point of its domain, under the assumption that a limit point exists. The proposed method is applied to a wide collection of image processing problems and our numerical tests show that our algorithm results to be flexible, robust and competitive when compared to recently proposed approaches able to address the optimization problems arising in the considered applications.
In this paper, we study the regularized second-order methods for unconstrained minimization of a twice-differentiable (convex or nonconvex) objective function. For the current function, these methods automatically achieve the best ā¦
In this paper, we study the regularized second-order methods for unconstrained minimization of a twice-differentiable (convex or nonconvex) objective function. For the current function, these methods automatically achieve the best possible global complexity estimates among different Hƶlder classes containing the Hessian of the objective. We show that such methods for functional residual and for the norm of the gradient must be different. For development of the latter methods, we introduced two new line-search acceptance criteria, which can be seen as generalizations of the Armijo condition.
In this paper, we study accelerated regularized Newton methods for minimizing objectives formed as a sum of two functions: one is convex and twice differentiable with Hƶlder-continuous Hessian, and the ā¦
In this paper, we study accelerated regularized Newton methods for minimizing objectives formed as a sum of two functions: one is convex and twice differentiable with Hƶlder-continuous Hessian, and the other is a simple closed convex function. For the case in which the Hƶlder parameter $\nu\in [0,1]$ is known, we propose methods that take at most $\mathcal{O}\big({1 \over \epsilon^{1/(2+\nu)}}\big)$ iterations to reduce the functional residual below a given precision $\epsilon > 0$. For the general case, in which the $\nu$ is not known, we propose a universal method that ensures the same precision in at most $\mathcal{O}\big({1 \over \epsilon^{2/[3(1+\nu)]}}\big)$ iterations without using $\nu$ explicitly in the scheme.
In this paper, we study the iteration complexity of cubic regularization of Newton method for solving composite minimization problems with uniformly convex objective. We introduce the notion of second-order condition ā¦
In this paper, we study the iteration complexity of cubic regularization of Newton method for solving composite minimization problems with uniformly convex objective. We introduce the notion of second-order condition number of a certain degree and justify the linear rate of convergence in a nondegenerate case for the method with an adaptive estimate of the regularization parameter. The algorithm automatically achieves the best possible global complexity bound among different problem classes of uniformly convex objective functions with Hƶlder continuous Hessian of the smooth part of the objective. As a byproduct of our developments, we justify an intuitively plausible result that the global iteration complexity of the Newton method is always better than that of the gradient method on the class of strongly convex functions with uniformly bounded second derivative.
This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification ā¦
This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural networks, we discuss how optimization problems arise in machine learning and what makes them challenging. A major theme of our study is that large-scale machine learning represents a distinctive setting in which the stochastic gradient (SG) method has traditionally played a central role while conventional gradient-based nonlinear optimization techniques typically falter. Based on this viewpoint, we present a comprehensive theory of a straightforward, yet versatile SG algorithm, discuss its practical behavior, and highlight opportunities for designing algorithms with improved performance. This leads to a discussion about the next generation of optimization methods for large-scale machine learning, including an investigation of two main streams of research on techniques that diminish noise in the stochastic directions and methods that make use of second-order derivative approximations.
In this paper we consider the cubic regularization (CR) method, a regularized version of the classical Newton method, for minimizing a twice continuously differentiable function. While it is well known ā¦
In this paper we consider the cubic regularization (CR) method, a regularized version of the classical Newton method, for minimizing a twice continuously differentiable function. While it is well known that the CR method is globally convergent and enjoys a superior global iteration complexity, existing results on its local quadratic convergence require a stringent nondegeneracy condition. We prove that under a local error bound (EB) condition, which is a much weaker requirement than the existing nondegeneracy condition, the sequence of iterates generated by the CR method converges at least Q-quadratically to a second-order critical point. This indicates that adding cubic regularization not only equips Newton's method with remarkable global convergence properties but also enables it to converge quadratically even in the presence of degenerate solutions. As a by-product, we show that without assuming convexity the proposed EB condition is equivalent to a quadratic growth condition, which could be of independent interest. To demonstrate the usefulness and relevance of our convergence analysis, we focus on two concrete nonconvex optimization problems that arise in phase retrieval and low-rank matrix recovery and prove that with overwhelming probability the sequence of iterates generated by the CR method for solving these two problems converges at least Q-quadratically to a global minimizer. To support and complement our theoretical development, we also present numerical results of the CR method when applied to solve these two problems.
In this paper we study sufficient conditions for metric subregularity of a set-valued map which is the sum of a single-valued continuous map and a locally closed subset. First we ā¦
In this paper we study sufficient conditions for metric subregularity of a set-valued map which is the sum of a single-valued continuous map and a locally closed subset. First we derive a sufficient condition for metric subregularity which is weaker than the so-called first-order sufficient condition for metric subregularity (FOSCMS) by adding an extra sequential condition. Then we introduce directional versions of quasi-normality and pseudo-normality which are stronger than the new weak sufficient condition for metric subregularity but weaker than classical quasi-normality and pseudo-normality. Moreover we introduce a nonsmooth version of the second-order sufficient condition for metric subregularity and show that it is a sufficient condition for the new sufficient condition for metric subregularity to hold. An example is used to illustrate that directional pseudo-normality can be weaker than FOSCMS. For the class of set-valued maps where the single-valued mapping is affine and the abstract set is the union of finitely many convex polyhedral sets, we show that pseudo-normality and hence directional pseudo-normality holds automatically at each point of the graph. Finally we apply our results to complementarity and Karush--Kuhn--Tucker systems.
Adaptive cubic regularization methods have emerged as a credible alternative to linesearch and trust-region for smooth nonconvex optimization, with optimal complexity amongst second-order methods. Here we consider a general/new class ā¦
Adaptive cubic regularization methods have emerged as a credible alternative to linesearch and trust-region for smooth nonconvex optimization, with optimal complexity amongst second-order methods. Here we consider a general/new class of adaptive regularization methods that use first- or higher-order local Taylor models of the objective regularized by a(ny) power of the step size and applied to convexly constrained optimization problems. We investigate the worst-case evaluation complexity/global rate of convergence of these algorithms, when the level of sufficient smoothness of the objective may be unknown or may even be absent. We find that the methods accurately reflect in their complexity the degree of smoothness of the objective and satisfy increasingly better bounds with improving model accuracy. The bounds vary continuously and robustly with respect to the regularization power and accuracy of the model and the degree of smoothness of the objective.
A Bregman Forward-Backward Linesearch Algorithm for Nonconvex Composite Optimization: Superlinear Convergence to Nonisolated Local Minima
A Bregman Forward-Backward Linesearch Algorithm for Nonconvex Composite Optimization: Superlinear Convergence to Nonisolated Local Minima
Abstract Optimization problems with composite functions consist of an objective function which is the sum of a smooth and a (convex) nonsmooth term. This particular structure is exploited by the ā¦
Abstract Optimization problems with composite functions consist of an objective function which is the sum of a smooth and a (convex) nonsmooth term. This particular structure is exploited by the class of proximal gradient methods and some of their generalizations like proximal Newton and quasi-Newton methods. The current literature on these classes of methods almost exclusively considers the case where also the smooth term is convex. Here we present a globalized proximal Newton-type method which allows the smooth term to be nonconvex. The method is shown to have nice global and local convergence properties, and some numerical results indicate that this method is very promising also from a practical point of view.
In this paper, we study local convergence of high-order Tensor Methods for solving convex optimization problems with composite objective. We justify local superlinear convergence under the assumption of uniform convexity ā¦
In this paper, we study local convergence of high-order Tensor Methods for solving convex optimization problems with composite objective. We justify local superlinear convergence under the assumption of uniform convexity of the smooth component, having Lipschitz-continuous high-order derivative. The convergence both in function value and in the norm of minimal subgradient is established. Global complexity bounds for the Composite Tensor Method in convex and uniformly convex cases are also discussed. Lastly, we show how local convergence of the methods can be globalized using the inexact proximal iterations.
The first moment and second central moments of the portfolio return, a.k.a. mean and variance, have been widely employed to assess the expected profit and risk of the portfolio. Investors ā¦
The first moment and second central moments of the portfolio return, a.k.a. mean and variance, have been widely employed to assess the expected profit and risk of the portfolio. Investors pursue higher mean and lower variance when designing the portfolios. The two moments can well describe the distribution of the portfolio return when it follows the Gaussian distribution. However, the real world distribution of assets return is usually asymmetric and heavy-tailed, which is far from being a Gaussian distribution. The asymmetry and the heavy-tailedness are characterized by the third and fourth central moments, i.e., skewness and kurtosis, respectively. Higher skewness and lower kurtosis are preferred to reduce the probability of extreme losses. However, incorporating high-order moments in the portfolio design is very difficult due to their non-convexity and rapidly increasing computational cost with the dimension. In this paper, we propose a very efficient and convergence-provable algorithm framework based on the successive convex approximation (SCA) algorithm to solve high-order portfolios. The efficiency of the proposed algorithm framework is demonstrated by the numerical experiments.
In this paper, we study a fully composite formulation of convex optimization problems, which includes, as a particular case, the problems with functional constraints, max-type minimization problems, and problems with ā¦
In this paper, we study a fully composite formulation of convex optimization problems, which includes, as a particular case, the problems with functional constraints, max-type minimization problems, and problems with simple nondifferentiable components. We treat all these formulations in a unified way, highlighting the existence of very natural optimization schemes of different order $p \geq 1$. As the result, we obtain new high-order $(p \geq 2)$ optimization methods for composite formulation. We prove the global convergence rates for them under the most general conditions. Assuming that the upper-level component of our objective function is subhomogeneous, we develop efficient modification of the basic fully composite first-order and second-order methods and propose their accelerated variants.
Abstract This paper focuses on the minimization of a sum of a twice continuously differentiable function f and a nonsmooth convex function. An inexact regularized proximal Newton method is proposed ā¦
Abstract This paper focuses on the minimization of a sum of a twice continuously differentiable function f and a nonsmooth convex function. An inexact regularized proximal Newton method is proposed by an approximation to the Hessian of f involving the $$\varrho $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>ϱ</mml:mi> </mml:math> th power of the KKT residual. For $$\varrho =0$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>ϱ</mml:mi> <mml:mo>=</mml:mo> <mml:mn>0</mml:mn> </mml:mrow> </mml:math> , we justify the global convergence of the iterate sequence for the KL objective function and its R-linear convergence rate for the KL objective function of exponent 1/2. For $$\varrho \in (0,1)$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>ϱ</mml:mi> <mml:mo>ā</mml:mo> <mml:mo>(</mml:mo> <mml:mn>0</mml:mn> <mml:mo>,</mml:mo> <mml:mn>1</mml:mn> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> , by assuming that cluster points satisfy a locally Hƶlderian error bound of order q on a second-order stationary point set and a local error bound of order $$q>1\!+\!\varrho $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>q</mml:mi> <mml:mo>></mml:mo> <mml:mn>1</mml:mn> <mml:mspace/> <mml:mo>+</mml:mo> <mml:mspace/> <mml:mi>ϱ</mml:mi> </mml:mrow> </mml:math> on the common stationary point set, respectively, we establish the global convergence of the iterate sequence and its superlinear convergence rate with order depending on q and $$\varrho $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>ϱ</mml:mi> </mml:math> . A dual semismooth Newton augmented Lagrangian method is also developed for seeking an inexact minimizer of subproblems. Numerical comparisons with two state-of-the-art methods on $$\ell _1$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:msub> <mml:mi>ā</mml:mi> <mml:mn>1</mml:mn> </mml:msub> </mml:math> -regularized Studentās t -regressions, group penalized Studentās t -regressions, and nonconvex image restoration confirm the efficiency of the proposed method.
Abstract In this paper, we introduce an inexact regularized proximal Newton method (IRPNM) that does not require any line search. The method is designed to minimize the sum of a ā¦
Abstract In this paper, we introduce an inexact regularized proximal Newton method (IRPNM) that does not require any line search. The method is designed to minimize the sum of a twice continuously differentiable function f and a convex (possibly non-smooth and extended-valued) function $$\varphi $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>Ļ</mml:mi> </mml:math> . Instead of controlling a step size by a line search procedure, we update the regularization parameter in a suitable way, based on the success of the previous iteration. The global convergence of the sequence of iterations and its superlinear convergence rate under a local Hƶlderian error bound assumption are shown. Notably, these convergence results are obtained without requiring a global Lipschitz property for $$ \nabla f $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>ā</mml:mi> <mml:mi>f</mml:mi> </mml:mrow> </mml:math> , which, to the best of the authorsā knowledge, is a novel contribution for proximal Newton methods. To highlight the efficiency of our approach, we provide numerical comparisons with an IRPNM using a line search globalization and a modern FISTA-type method.