By using antitrace map method, we investigate the light transmission for a generalized Fibonacci multilayers. Analytical results are obtained for transmission coefficients in some special cases. We find that the …
By using antitrace map method, we investigate the light transmission for a generalized Fibonacci multilayers. Analytical results are obtained for transmission coefficients in some special cases. We find that the transmission coefficients possess two-cycle property or six-cycle property. The cycle properties of the trace and antitrace are also obtained.
In this paper, we analyse the problem of the optical realization of Walsh transformation, and give the design of the optical system which is composed of holographic lenses and can …
In this paper, we analyse the problem of the optical realization of Walsh transformation, and give the design of the optical system which is composed of holographic lenses and can be used to realize the 8×8 complex Walsh transformation in two-dimensional space.
For high dimensional sparse linear regression problems, we propose a sequential convex relaxation algorithm (iSCRA-TL1) by solving inexactly a sequence of truncated $\ell_1$-norm regularized minimization problems, in which the working …
For high dimensional sparse linear regression problems, we propose a sequential convex relaxation algorithm (iSCRA-TL1) by solving inexactly a sequence of truncated $\ell_1$-norm regularized minimization problems, in which the working index sets are constructed iteratively with an adaptive strategy. We employ the robust restricted null space property and sequential restricted null space property (rRNSP and rSRNSP) to provide the theoretical certificates of iSCRA-TL1. Specifically, under a mild rRNSP or rSRNSP, iSCRA-TL1 is shown to identify the support of the true $r$-sparse vector by solving at most $r$ truncated $\ell_1$-norm regularized problems, and the $\ell_1$-norm error bound of its iterates from the oracle solution is also established. As a consequence, an oracle estimator of high-dimensional linear regression problems can be achieved by solving at most $r\!+\!1$ truncated $\ell_1$-norm regularized problems. To the best of our knowledge, this is the first sequential convex relaxation algorithm to produce an oracle estimator under a weaker NSP condition within a specific number of steps, provided that the Lasso estimator lacks high quality, say, the supports of its first $r$ largest (in modulus) entries do not coincide with those of the true vector.
Abstract We propose a class of nonconvex and nonsmooth image reconstruction models for impulse noise removal, which is rather general and covers existing nonconvex and nonsmooth models for reconstructing images …
Abstract We propose a class of nonconvex and nonsmooth image reconstruction models for impulse noise removal, which is rather general and covers existing nonconvex and nonsmooth models for reconstructing images with impulse noise. For this class of models, we develop an inexact proximal majorization-minimization algorithm with an implementable inexactness criterion by constructing a strongly convex local majorization of the objective function and meanwhile seeking its inexact minimizer in each iteration, and provide its theoretical certificate by proving the full convergence of the iterate sequence under the Kurdyka-{\L}ojasiewicz property of the constructed potential function. This inexact proximal majorization-minimization method is applied to linear image restoration problems from deblurring and inpainting, and a class of nonlinear image reconstruction from Fourier phase retrieval. Numerical comparisons with the proximal linearized minimization algorithm for image deblurring and the alternating direction method of multipliers for image inpainting and Fourier phase retrieval validate the efficiency of the proposed method.
Abstract We propose a class of nonconvex and nonsmooth image reconstruction models for impulse noise removal, which is rather general and covers existing nonconvex and nonsmooth models for reconstructing images …
Abstract We propose a class of nonconvex and nonsmooth image reconstruction models for impulse noise removal, which is rather general and covers existing nonconvex and nonsmooth models for reconstructing images with impulse noise. For this class of models, we develop an inexact proximal majorization-minimization algorithm with an implementable inexactness criterion by constructing a strongly convex local majorization of the objective function and meanwhile seeking its inexact minimizer in each iteration, and provide its theoretical certificate by proving the full convergence of the iterate sequence under the Kurdyka-{\L}ojasiewicz property of the constructed potential function. This inexact proximal majorization-minimization method is applied to linear image restoration problems from deblurring and inpainting, and a class of nonlinear image reconstruction from Fourier phase retrieval. Numerical comparisons with the proximal linearized minimization algorithm for image deblurring and the alternating direction method of multipliers for image inpainting and Fourier phase retrieval validate the efficiency of the proposed method.
For high dimensional sparse linear regression problems, we propose a sequential convex relaxation algorithm (iSCRA-TL1) by solving inexactly a sequence of truncated $\ell_1$-norm regularized minimization problems, in which the working …
For high dimensional sparse linear regression problems, we propose a sequential convex relaxation algorithm (iSCRA-TL1) by solving inexactly a sequence of truncated $\ell_1$-norm regularized minimization problems, in which the working index sets are constructed iteratively with an adaptive strategy. We employ the robust restricted null space property and sequential restricted null space property (rRNSP and rSRNSP) to provide the theoretical certificates of iSCRA-TL1. Specifically, under a mild rRNSP or rSRNSP, iSCRA-TL1 is shown to identify the support of the true $r$-sparse vector by solving at most $r$ truncated $\ell_1$-norm regularized problems, and the $\ell_1$-norm error bound of its iterates from the oracle solution is also established. As a consequence, an oracle estimator of high-dimensional linear regression problems can be achieved by solving at most $r\!+\!1$ truncated $\ell_1$-norm regularized problems. To the best of our knowledge, this is the first sequential convex relaxation algorithm to produce an oracle estimator under a weaker NSP condition within a specific number of steps, provided that the Lasso estimator lacks high quality, say, the supports of its first $r$ largest (in modulus) entries do not coincide with those of the true vector.
By using antitrace map method, we investigate the light transmission for a generalized Fibonacci multilayers. Analytical results are obtained for transmission coefficients in some special cases. We find that the …
By using antitrace map method, we investigate the light transmission for a generalized Fibonacci multilayers. Analytical results are obtained for transmission coefficients in some special cases. We find that the transmission coefficients possess two-cycle property or six-cycle property. The cycle properties of the trace and antitrace are also obtained.
In this paper, we analyse the problem of the optical realization of Walsh transformation, and give the design of the optical system which is composed of holographic lenses and can …
In this paper, we analyse the problem of the optical realization of Walsh transformation, and give the design of the optical system which is composed of holographic lenses and can be used to realize the 8×8 complex Walsh transformation in two-dimensional space.
We study the convergence properties of an alternating proximal minimization algorithm for nonconvex structured functions of the type: L(x,y)=f(x)+Q(x,y)+g(y), where f and g are proper lower semicontinuous functions, defined on …
We study the convergence properties of an alternating proximal minimization algorithm for nonconvex structured functions of the type: L(x,y)=f(x)+Q(x,y)+g(y), where f and g are proper lower semicontinuous functions, defined on Euclidean spaces, and Q is a smooth function that couples the variables x and y. The algorithm can be viewed as a proximal regularization of the usual Gauss-Seidel method to minimize L. We work in a nonconvex setting, just assuming that the function L satisfies the Kurdyka-Łojasiewicz inequality. An entire section illustrates the relevancy of such an assumption by giving examples ranging from semialgebraic geometry to “metrically regular” problems. Our main result can be stated as follows: If L has the Kurdyka-Łojasiewicz property, then each bounded sequence generated by the algorithm converges to a critical point of L. This result is completed by the study of the convergence rate of the algorithm, which depends on the geometrical properties of the function L around its critical points. When specialized to [Formula: see text] and to f, g indicator functions, the algorithm is an alternating projection mehod (a variant of von Neumann's) that converges for a wide class of sets including semialgebraic and tame sets, transverse smooth manifolds or sets with “regular” intersection. To illustrate our results with concrete problems, we provide a convergent proximal reweighted ℓ 1 algorithm for compressive sensing and an application to rank reduction problems.
Abstract This paper focuses on the minimization of a sum of a twice continuously differentiable function f and a nonsmooth convex function. An inexact regularized proximal Newton method is proposed …
Abstract This paper focuses on the minimization of a sum of a twice continuously differentiable function f and a nonsmooth convex function. An inexact regularized proximal Newton method is proposed by an approximation to the Hessian of f involving the $$\varrho $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>ϱ</mml:mi> </mml:math> th power of the KKT residual. For $$\varrho =0$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>ϱ</mml:mi> <mml:mo>=</mml:mo> <mml:mn>0</mml:mn> </mml:mrow> </mml:math> , we justify the global convergence of the iterate sequence for the KL objective function and its R-linear convergence rate for the KL objective function of exponent 1/2. For $$\varrho \in (0,1)$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>ϱ</mml:mi> <mml:mo>∈</mml:mo> <mml:mo>(</mml:mo> <mml:mn>0</mml:mn> <mml:mo>,</mml:mo> <mml:mn>1</mml:mn> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> , by assuming that cluster points satisfy a locally Hölderian error bound of order q on a second-order stationary point set and a local error bound of order $$q>1\!+\!\varrho $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>q</mml:mi> <mml:mo>></mml:mo> <mml:mn>1</mml:mn> <mml:mspace/> <mml:mo>+</mml:mo> <mml:mspace/> <mml:mi>ϱ</mml:mi> </mml:mrow> </mml:math> on the common stationary point set, respectively, we establish the global convergence of the iterate sequence and its superlinear convergence rate with order depending on q and $$\varrho $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>ϱ</mml:mi> </mml:math> . A dual semismooth Newton augmented Lagrangian method is also developed for seeking an inexact minimizer of subproblems. Numerical comparisons with two state-of-the-art methods on $$\ell _1$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:msub> <mml:mi>ℓ</mml:mi> <mml:mn>1</mml:mn> </mml:msub> </mml:math> -regularized Student’s t -regressions, group penalized Student’s t -regressions, and nonconvex image restoration confirm the efficiency of the proposed method.
Due to their property of enhancing the sparsity of solutions, $l_1$-regularized optimization problems have developed into a highly dynamic research area with a wide range of applications. We present a …
Due to their property of enhancing the sparsity of solutions, $l_1$-regularized optimization problems have developed into a highly dynamic research area with a wide range of applications. We present a class of methods for $l_1$-regularized optimization problems that are based on a combination of semismooth Newton steps, a filter globalization, and shrinkage/thresholding steps. A multidimensional filter framework is used to control the acceptance and to evaluate the quality of the semismooth Newton steps. If the current Newton iterate is rejected a shrinkage/thresholding-based step with quasi-Armijo stepsize rule is used instead. Global convergence and transition to local q-superlinear convergence for both convex and nonconvex objective functions are established. We present numerical results and comparisons with several state-of-the-art methods that show the efficiency and competitiveness of the proposed method.
A new type of method for unconstrained optimization, called a tensor method, is introduced. It is related in its basic philosophy to the tensor methods for nonlinear equations for Schnabel …
A new type of method for unconstrained optimization, called a tensor method, is introduced. It is related in its basic philosophy to the tensor methods for nonlinear equations for Schnabel and Frank [SIAM J. Numer. Anal., 21 (1984), pp. 815–843], but beyond that the methods have significant differences. The tensor method for unconstrained optimization bases each iteration upon a fourth order model of the objective function. This model consists of the quadratic portion of the Taylor series, plus low-rank third and fourth order terms that cause the model to interpolate already calculated function and gradient values from one or more previous iterates. This paper also shows that the costs of forming, storing, and solving the tensor model are not significantly more than these costs for a standard method based upon a quadratic Taylor series model. Test results are presented for sets of problems where the Hessian at the minimizer is nonsingular, and where it is singular. On all the test sets, the tensor method solves considerably more problems than a comparable standard method. On problems solved by both methods, the tensor method requires about half as many iterations, and half as many function and derivative evaluations as the standard method, on the average.
The purpose of this paper is to discuss the potential of nonmonotone techniques for enforcing convergence of unconstrained minimization algorithms from starting points distant from the solution. Linesearch-based algorithms are …
The purpose of this paper is to discuss the potential of nonmonotone techniques for enforcing convergence of unconstrained minimization algorithms from starting points distant from the solution. Linesearch-based algorithms are considered for both small and large problems, and extensive numerical experiments show that this potential is sometimes considerable. A new variant is introduced in order to limit some of the identified drawbacks of the existing techniques. This variant is again numerically tested and appears to be competitive. Finally, the impact of preconditioning on the considered methods is examined.
The word “tame” is used in the title in the same context as in expressions like “convex optimization,” “nonsmooth optimization,” etc.—as a reference to the class of objects involved in …
The word “tame” is used in the title in the same context as in expressions like “convex optimization,” “nonsmooth optimization,” etc.—as a reference to the class of objects involved in the formulation of optimization problems. Definable and tame functions and mappings associated with various o-minimal structures (e.g. semilinear, semialgebraic, globally subanalytic, and others) have a number of remarkable properties which make them an attractive domain for various applications. This relates both to the power of results that can be obtained and the power of available analytic techniques. The paper surveys certain ideas and recent results, some new, which have been or (hopefully) can be productively used in studies relating to variational analysis and nonsmooth optimization.
In this paper a nonmonotone steplength selection rule for Newton’s method is proposed, which can be viewed as a generalization of Armijo’s rule. Numerical results are reported which indicate that …
In this paper a nonmonotone steplength selection rule for Newton’s method is proposed, which can be viewed as a generalization of Armijo’s rule. Numerical results are reported which indicate that the proposed technique may allow a considerable saving both in the number of line searches and in the number of function evaluations.
We derive two-point step sizes for the steepest-descent method by approximating the secant equation. At the cost of storage of an extra iterate and gradient, these algorithms achieve better performance …
We derive two-point step sizes for the steepest-descent method by approximating the secant equation. At the cost of storage of an extra iterate and gradient, these algorithms achieve better performance and cheaper computation than the classical steepest-descent method. We indicate a convergence analysis of the method in the two-dimensional quadratic case. The behaviour is highly remarkable and the analysis entirely nonstandard.
In the early eighties Lojasiewicz [in Seminari di Geometria 1982-1983, Università di Bologna, Istituto di Geometria, Dipartimento di Matematica, 1984, pp. 115--117] proved that a bounded solution of a gradient …
In the early eighties Lojasiewicz [in Seminari di Geometria 1982-1983, Università di Bologna, Istituto di Geometria, Dipartimento di Matematica, 1984, pp. 115--117] proved that a bounded solution of a gradient flow for an analytic cost function converges to a well-defined limit point. In this paper, we show that the iterates of numerical descent algorithms, for an analytic cost function, share this convergence property if they satisfy certain natural descent conditions. The results obtained are applicable to a broad class of optimization schemes and strengthen classical "weak convergence" results for descent methods to "strong limit-point convergence" for a large class of cost functions of practical interest. The result does not require that the cost has isolated critical points and requires no assumptions on the convexity of the cost nor any nondegeneracy conditions on the Hessian of the cost at critical points.
This paper is mainly devoted to the study and applications of Hölder metric subregularity (or metric $q$-subregularity of order $q\in(0,1]$) for general set-valued mappings between infinite-dimensional spaces. Employing advanced techniques …
This paper is mainly devoted to the study and applications of Hölder metric subregularity (or metric $q$-subregularity of order $q\in(0,1]$) for general set-valued mappings between infinite-dimensional spaces. Employing advanced techniques of variational analysis and generalized differentiation, we derive neighborhood and point-based sufficient conditions as well as necessary conditions for $q$-metric subregularity with evaluating the exact subregularity bound, which are new even for the conventional (first-order) metric subregularity in both finite and infinite dimensions. In this way we also obtain new fractional error bound results for composite polynomial systems with explicit calculating fractional exponents. Finally, metric $q$-subregularity is applied to conduct a quantitative convergence analysis of the classical proximal point method (PPM) for finding zeros of maximal monotone operators on Hilbert spaces.
A new nonmonotone line search algorithm is proposed and analyzed. In our scheme, we require that an average of the successive function values decreases, while the traditional nonmonotone approach of …
A new nonmonotone line search algorithm is proposed and analyzed. In our scheme, we require that an average of the successive function values decreases, while the traditional nonmonotone approach of Grippo, Lampariello, and Lucidi [SIAM J. Numer. Anal., 23 (1986), pp. 707--716] requires that a maximum of recent function values decreases. We prove global convergence for nonconvex, smooth functions, and R-linear convergence for strongly convex functions. For the L-BFGS method and the unconstrained optimization problems in the CUTE library, the new nonmonotone line search algorithm used fewer function and gradient evaluations, on average, than either the monotone or the traditional nonmonotone scheme.
We consider a variable metric linesearch based proximal gradient method for the minimization of the sum of a smooth, possibly nonconvex function plus a convex, possibly nonsmooth term. We prove …
We consider a variable metric linesearch based proximal gradient method for the minimization of the sum of a smooth, possibly nonconvex function plus a convex, possibly nonsmooth term. We prove convergence of this iterative algorithm to a critical point if the objective function satisfies the Kurdyka-Lojasiewicz property at each point of its domain, under the assumption that a limit point exists. The proposed method is applied to a wide collection of image processing problems and our numerical tests show that our algorithm results to be flexible, robust and competitive when compared to recently proposed approaches able to address the optimization problems arising in the considered applications.
We consider the minimization of a cost function $f$ on a manifold $M$ using Riemannian gradient descent and Riemannian trust regions (RTR). We focus on satisfying necessary optimality conditions within …
We consider the minimization of a cost function $f$ on a manifold $M$ using Riemannian gradient descent and Riemannian trust regions (RTR). We focus on satisfying necessary optimality conditions within a tolerance $\varepsilon$. Specifically, we show that, under Lipschitz-type assumptions on the pullbacks of $f$ to the tangent spaces of $M$, both of these algorithms produce points with Riemannian gradient smaller than $\varepsilon$ in $O(1/\varepsilon^2)$ iterations. Furthermore, RTR returns a point where also the Riemannian Hessian's least eigenvalue is larger than $-\varepsilon$ in $O(1/\varepsilon^3)$ iterations. There are no assumptions on initialization. The rates match their (sharp) unconstrained counterparts as a function of the accuracy $\varepsilon$ (up to constants) and hence are sharp in that sense. These are the first deterministic results for global rates of convergence to approximate first- and second-order Karush-Kuhn-Tucker points on manifolds. They apply in particular for optimization constrained to compact submanifolds of $\mathbb{R}^n$, under simpler assumptions.
In this paper, we study the regularized second-order methods for unconstrained minimization of a twice-differentiable (convex or nonconvex) objective function. For the current function, these methods automatically achieve the best …
In this paper, we study the regularized second-order methods for unconstrained minimization of a twice-differentiable (convex or nonconvex) objective function. For the current function, these methods automatically achieve the best possible global complexity estimates among different Hölder classes containing the Hessian of the objective. We show that such methods for functional residual and for the norm of the gradient must be different. For development of the latter methods, we introduced two new line-search acceptance criteria, which can be seen as generalizations of the Armijo condition.
In this paper, we study accelerated regularized Newton methods for minimizing objectives formed as a sum of two functions: one is convex and twice differentiable with Hölder-continuous Hessian, and the …
In this paper, we study accelerated regularized Newton methods for minimizing objectives formed as a sum of two functions: one is convex and twice differentiable with Hölder-continuous Hessian, and the other is a simple closed convex function. For the case in which the Hölder parameter $\nu\in [0,1]$ is known, we propose methods that take at most $\mathcal{O}\big({1 \over \epsilon^{1/(2+\nu)}}\big)$ iterations to reduce the functional residual below a given precision $\epsilon > 0$. For the general case, in which the $\nu$ is not known, we propose a universal method that ensures the same precision in at most $\mathcal{O}\big({1 \over \epsilon^{2/[3(1+\nu)]}}\big)$ iterations without using $\nu$ explicitly in the scheme.
In this paper, we study the iteration complexity of cubic regularization of Newton method for solving composite minimization problems with uniformly convex objective. We introduce the notion of second-order condition …
In this paper, we study the iteration complexity of cubic regularization of Newton method for solving composite minimization problems with uniformly convex objective. We introduce the notion of second-order condition number of a certain degree and justify the linear rate of convergence in a nondegenerate case for the method with an adaptive estimate of the regularization parameter. The algorithm automatically achieves the best possible global complexity bound among different problem classes of uniformly convex objective functions with Hölder continuous Hessian of the smooth part of the objective. As a byproduct of our developments, we justify an intuitively plausible result that the global iteration complexity of the Newton method is always better than that of the gradient method on the class of strongly convex functions with uniformly bounded second derivative.
This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification …
This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural networks, we discuss how optimization problems arise in machine learning and what makes them challenging. A major theme of our study is that large-scale machine learning represents a distinctive setting in which the stochastic gradient (SG) method has traditionally played a central role while conventional gradient-based nonlinear optimization techniques typically falter. Based on this viewpoint, we present a comprehensive theory of a straightforward, yet versatile SG algorithm, discuss its practical behavior, and highlight opportunities for designing algorithms with improved performance. This leads to a discussion about the next generation of optimization methods for large-scale machine learning, including an investigation of two main streams of research on techniques that diminish noise in the stochastic directions and methods that make use of second-order derivative approximations.
In this paper we consider the cubic regularization (CR) method, a regularized version of the classical Newton method, for minimizing a twice continuously differentiable function. While it is well known …
In this paper we consider the cubic regularization (CR) method, a regularized version of the classical Newton method, for minimizing a twice continuously differentiable function. While it is well known that the CR method is globally convergent and enjoys a superior global iteration complexity, existing results on its local quadratic convergence require a stringent nondegeneracy condition. We prove that under a local error bound (EB) condition, which is a much weaker requirement than the existing nondegeneracy condition, the sequence of iterates generated by the CR method converges at least Q-quadratically to a second-order critical point. This indicates that adding cubic regularization not only equips Newton's method with remarkable global convergence properties but also enables it to converge quadratically even in the presence of degenerate solutions. As a by-product, we show that without assuming convexity the proposed EB condition is equivalent to a quadratic growth condition, which could be of independent interest. To demonstrate the usefulness and relevance of our convergence analysis, we focus on two concrete nonconvex optimization problems that arise in phase retrieval and low-rank matrix recovery and prove that with overwhelming probability the sequence of iterates generated by the CR method for solving these two problems converges at least Q-quadratically to a global minimizer. To support and complement our theoretical development, we also present numerical results of the CR method when applied to solve these two problems.
In this paper we study sufficient conditions for metric subregularity of a set-valued map which is the sum of a single-valued continuous map and a locally closed subset. First we …
In this paper we study sufficient conditions for metric subregularity of a set-valued map which is the sum of a single-valued continuous map and a locally closed subset. First we derive a sufficient condition for metric subregularity which is weaker than the so-called first-order sufficient condition for metric subregularity (FOSCMS) by adding an extra sequential condition. Then we introduce directional versions of quasi-normality and pseudo-normality which are stronger than the new weak sufficient condition for metric subregularity but weaker than classical quasi-normality and pseudo-normality. Moreover we introduce a nonsmooth version of the second-order sufficient condition for metric subregularity and show that it is a sufficient condition for the new sufficient condition for metric subregularity to hold. An example is used to illustrate that directional pseudo-normality can be weaker than FOSCMS. For the class of set-valued maps where the single-valued mapping is affine and the abstract set is the union of finitely many convex polyhedral sets, we show that pseudo-normality and hence directional pseudo-normality holds automatically at each point of the graph. Finally we apply our results to complementarity and Karush--Kuhn--Tucker systems.
In this paper we consider a class of structured nonsmooth difference-of-convex (DC) minimization in which the first convex component is the sum of a smooth and a nonsmooth function while …
In this paper we consider a class of structured nonsmooth difference-of-convex (DC) minimization in which the first convex component is the sum of a smooth and a nonsmooth function while the second convex component is the supremum of finitely many convex smooth functions. The existing methods for this problem usually have weak convergence guarantees or exhibit slow convergence. Due to this, we propose two nonmonotone enhanced proximal DC algorithms for solving this problem. For possible acceleration, one uses a nonmonotone line-search scheme in which the associated Lipschitz constant is adaptively approximated by some local curvature information of the smooth function in the first convex component, and the other employs an extrapolation scheme. It is shown that every accumulation point of the solution sequence generated by them is a D-stationary point of the problem. These methods may, however, become inefficient when the number of convex smooth functions in defining the second convex component is large. To remedy this issue, we propose randomized counterparts for them and show that every accumulation point of the generated solution sequence is a D-stationary point of the problem almost surely. Some preliminary numerical experiments are conducted to demonstrate the efficiency of the proposed algorithms.
Adaptive cubic regularization methods have emerged as a credible alternative to linesearch and trust-region for smooth nonconvex optimization, with optimal complexity amongst second-order methods. Here we consider a general/new class …
Adaptive cubic regularization methods have emerged as a credible alternative to linesearch and trust-region for smooth nonconvex optimization, with optimal complexity amongst second-order methods. Here we consider a general/new class of adaptive regularization methods that use first- or higher-order local Taylor models of the objective regularized by a(ny) power of the step size and applied to convexly constrained optimization problems. We investigate the worst-case evaluation complexity/global rate of convergence of these algorithms, when the level of sufficient smoothness of the objective may be unknown or may even be absent. We find that the methods accurately reflect in their complexity the degree of smoothness of the objective and satisfy increasingly better bounds with improving model accuracy. The bounds vary continuously and robustly with respect to the regularization power and accuracy of the model and the degree of smoothness of the objective.
A Bregman Forward-Backward Linesearch Algorithm for Nonconvex Composite Optimization: Superlinear Convergence to Nonisolated Local Minima
A Bregman Forward-Backward Linesearch Algorithm for Nonconvex Composite Optimization: Superlinear Convergence to Nonisolated Local Minima
Abstract Optimization problems with composite functions consist of an objective function which is the sum of a smooth and a (convex) nonsmooth term. This particular structure is exploited by the …
Abstract Optimization problems with composite functions consist of an objective function which is the sum of a smooth and a (convex) nonsmooth term. This particular structure is exploited by the class of proximal gradient methods and some of their generalizations like proximal Newton and quasi-Newton methods. The current literature on these classes of methods almost exclusively considers the case where also the smooth term is convex. Here we present a globalized proximal Newton-type method which allows the smooth term to be nonconvex. The method is shown to have nice global and local convergence properties, and some numerical results indicate that this method is very promising also from a practical point of view.
In this paper, we study local convergence of high-order Tensor Methods for solving convex optimization problems with composite objective. We justify local superlinear convergence under the assumption of uniform convexity …
In this paper, we study local convergence of high-order Tensor Methods for solving convex optimization problems with composite objective. We justify local superlinear convergence under the assumption of uniform convexity of the smooth component, having Lipschitz-continuous high-order derivative. The convergence both in function value and in the norm of minimal subgradient is established. Global complexity bounds for the Composite Tensor Method in convex and uniformly convex cases are also discussed. Lastly, we show how local convergence of the methods can be globalized using the inexact proximal iterations.
The sequence convergence of inexact nonconvex and nonsmooth algorithms is proved with an unrealistic assumption on the noise. In this paper, we focus on removing the assumption. Without the assumption, …
The sequence convergence of inexact nonconvex and nonsmooth algorithms is proved with an unrealistic assumption on the noise. In this paper, we focus on removing the assumption. Without the assumption, the algorithm consequently cannot be proved with previous framework and tricks. Thus, we build a new proof framework which employs a pseudo sufficient descent condition and a pseudo relative error condition both related to an auxiliary sequence; and a continuity condition is assumed to hold. In fact, a lot of classical inexact nonconvex and nonsmooth algorithms allow these three conditions. Under an assumption on the auxiliary sequence, we prove the sequence generated by the general algorithm converges to a critical point of the objective function if being assumed semi-algebraic property. The core of the proofs lies in building a new Lyapunov function, whose successive difference provides a bound for the successive difference of the points generated by the algorithm. And then, we apply our findings to the inexact nonconvex proximal inertial gradient algorithm and derive the corresponding convergence results.
The first moment and second central moments of the portfolio return, a.k.a. mean and variance, have been widely employed to assess the expected profit and risk of the portfolio. Investors …
The first moment and second central moments of the portfolio return, a.k.a. mean and variance, have been widely employed to assess the expected profit and risk of the portfolio. Investors pursue higher mean and lower variance when designing the portfolios. The two moments can well describe the distribution of the portfolio return when it follows the Gaussian distribution. However, the real world distribution of assets return is usually asymmetric and heavy-tailed, which is far from being a Gaussian distribution. The asymmetry and the heavy-tailedness are characterized by the third and fourth central moments, i.e., skewness and kurtosis, respectively. Higher skewness and lower kurtosis are preferred to reduce the probability of extreme losses. However, incorporating high-order moments in the portfolio design is very difficult due to their non-convexity and rapidly increasing computational cost with the dimension. In this paper, we propose a very efficient and convergence-provable algorithm framework based on the successive convex approximation (SCA) algorithm to solve high-order portfolios. The efficiency of the proposed algorithm framework is demonstrated by the numerical experiments.
In this paper, we study a fully composite formulation of convex optimization problems, which includes, as a particular case, the problems with functional constraints, max-type minimization problems, and problems with …
In this paper, we study a fully composite formulation of convex optimization problems, which includes, as a particular case, the problems with functional constraints, max-type minimization problems, and problems with simple nondifferentiable components. We treat all these formulations in a unified way, highlighting the existence of very natural optimization schemes of different order $p \geq 1$. As the result, we obtain new high-order $(p \geq 2)$ optimization methods for composite formulation. We prove the global convergence rates for them under the most general conditions. Assuming that the upper-level component of our objective function is subhomogeneous, we develop efficient modification of the basic fully composite first-order and second-order methods and propose their accelerated variants.
Abstract Composite optimization problems, where the sum of a smooth and a merely lower semicontinuous function has to be minimized, are often tackled numerically by means of proximal gradient methods …
Abstract Composite optimization problems, where the sum of a smooth and a merely lower semicontinuous function has to be minimized, are often tackled numerically by means of proximal gradient methods as soon as the lower semicontinuous part of the objective function is of simple enough structure. The available convergence theory associated with these methods (mostly) requires the derivative of the smooth part of the objective function to be (globally) Lipschitz continuous, and this might be a restrictive assumption in some practically relevant scenarios. In this paper, we readdress this classical topic and provide convergence results for the classical (monotone) proximal gradient method and one of its nonmonotone extensions which are applicable in the absence of (strong) Lipschitz assumptions. This is possible since, for the price of forgoing convergence rates, we omit the use of descent-type lemmas in our analysis.
Abstract This paper is concerned with a class of optimization problems with the non-negative orthogonal constraint, in which the objective function is $L$-smooth on an open set containing the Stiefel …
Abstract This paper is concerned with a class of optimization problems with the non-negative orthogonal constraint, in which the objective function is $L$-smooth on an open set containing the Stiefel manifold $\textrm {St}(n,r)$. We derive a locally Lipschitzian error bound for the feasible points without zero rows when $n&gt;r&gt;1$, and when $n&gt;r=1$ or $n=r$ achieve a global Lipschitzian error bound. Then, we show that the penalty problem induced by the elementwise $\ell _1$-norm distance to the non-negative cone is a global exact penalty, and so is the one induced by its Moreau envelope under a lower second-order calmness of the objective function. A practical penalty algorithm is developed by solving approximately a series of smooth penalty problems with a retraction-based nonmonotone line-search proximal gradient method, and any cluster point of the generated sequence is shown to be a stationary point of the original problem. Numerical comparisons with the ALM [Wen, Z. W. & Yin, W. T. (2013, A feasible method for optimization with orthogonality constraints. Math. Programming, 142, 397–434),] and the exact penalty method [Jiang, B., Meng, X., Wen, Z. W. & Chen, X. J. (2022, An exact penalty approach for optimization with nonnegative orthogonality constraints. Math. Programming. https://doi.org/10.1007/s10107-022-01794-8)] indicate that our penalty method has an advantage in terms of the quality of solutions despite taking a little more time.
This note is concerned with a class of nonmonotone descent methods for minimizing a proper lower semicontinuous Kurdyka–Łojasiewicz (KL) function , which generates a sequence satisfying a nonmonotone decrease condition …
This note is concerned with a class of nonmonotone descent methods for minimizing a proper lower semicontinuous Kurdyka–Łojasiewicz (KL) function , which generates a sequence satisfying a nonmonotone decrease condition and a relative error tolerance. Under suitable assumptions, we prove that the whole sequence converges to a limiting critical point of , and, when is a KL function of exponent , the convergence admits a linear rate if and a sublinear rate associated to if . These assumptions are shown to be sufficient and necessary if in addition is weakly convex on a neighborhood of the set of critical points. Our results not only extend the convergence results of monotone descent methods for KL optimization problems but also resolve the convergence problem on the iterate sequence generated by a class of nonmonotone line search algorithms for nonconvex and nonsmooth problems.
Discover New Methods for Dealing with High-Dimensional Data A sparse statistical model has only a small number of nonzero parameters or weights; therefore, it is much easier to estimate and …
Discover New Methods for Dealing with High-Dimensional Data A sparse statistical model has only a small number of nonzero parameters or weights; therefore, it is much easier to estimate and interpret than a dense model. Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underlying signal in a set of data. Top experts in this rapidly evolving field, the authors describe the lasso for linear regression and a simple coordinate descent algorithm for its computation. They discuss the application of 1 penalties to generalized linear models and support vector machines, cover generalized penalties such as the elastic net and group lasso, and review numerical methods for optimization. They also present statistical inference methods for fitted (lasso) models, including the bootstrap, Bayesian methods, and recently developed approaches. In addition, the book examines matrix decomposition, sparse multivariate analysis, graphical models, and compressed sensing. It concludes with a survey of theoretical results for the lasso. In this age of big data, the number of features measured on a person or object can be large and might be larger than the number of observations. This book shows how the sparsity assumption allows us to tackle these problems and extract useful and reproducible patterns from big datasets. Data analysts, computer scientists, and theorists will appreciate this thorough and up-to-date treatment of sparse statistical modeling.