Author Description

Login to generate an author description

Ask a Question About This Mathematician

Given a real‐analytic function $f:\mathbb{R}^{n} \rightarrow \mathbb{R}$ and a critical point $a \in \mathbb{R}^{n}$, the Łojasiewicz inequality asserts that there exists $\theta\in\lbrack\frac{1}{2},1)$ such that the function $|f-f(a)|^{\theta}\,\Vert\nabla f\Vert^{-1}$ remains bounded … Given a real‐analytic function $f:\mathbb{R}^{n} \rightarrow \mathbb{R}$ and a critical point $a \in \mathbb{R}^{n}$, the Łojasiewicz inequality asserts that there exists $\theta\in\lbrack\frac{1}{2},1)$ such that the function $|f-f(a)|^{\theta}\,\Vert\nabla f\Vert^{-1}$ remains bounded around a. In this paper, we extend the above result to a wide class of nonsmooth functions (that possibly admit the value $+\infty$), by establishing an analogous inequality in which the derivative $\nabla f(x)$ can be replaced by any element $x^{\ast}$ of the subdifferential $\partial f(x)$ of f. Like its smooth version, this result provides new insights into the convergence aspects of subgradient‐type dynamical systems. Provided that the function f is sufficiently regular (for instance, convex or lower‐$C^{2}$), the bounded trajectories of the corresponding subgradient dynamical system can be shown to be of finite length. Explicit estimates of the rate of convergence are also derived.
Let f be a continuous function on $\Rl^n$, and suppose f is continuously differentiable on an open dense subset. Such functions arise in many applications, and very often minimizers are … Let f be a continuous function on $\Rl^n$, and suppose f is continuously differentiable on an open dense subset. Such functions arise in many applications, and very often minimizers are points at which f is not differentiable. Of particular interest is the case where f is not convex, and perhaps not even locally Lipschitz, but is a function whose gradient is easily computed where it is defined. We present a practical, robust algorithm to locally minimize such functions, based on gradient sampling. No subgradient information is required by the algorithm. When f is locally Lipschitz and has bounded level sets, and the sampling radius $\eps$ is fixed, we show that, with probability 1, the algorithm generates a sequence with a cluster point that is Clarke $\eps$-stationary. Furthermore, we show that if f has a unique Clarke stationary point $\bar x$, then the set of all cluster points generated by the algorithm converges to $\bar x$ as $\eps$ is reduced to zero. Numerical results are presented demonstrating the robustness of the algorithm and its applicability in a wide variety of contexts, including cases where f is not locally Lipschitz at minimizers. We report approximate local minimizers for functions in the applications literature which have not, to our knowledge, been obtained previously. When the termination criteria of the algorithm are satisfied, a precise statement about nearness to Clarke $\eps$-stationarity is available. A MATLAB implementation of the algorithm is posted at http://www.cs.nyu.edu/overton/papers/gradsamp/alg.
We study randomized variants of two classical algorithms: coordinate descent for systems of linear equations and iterated projections for systems of linear inequalities. Expanding on a recent randomized iterated projection … We study randomized variants of two classical algorithms: coordinate descent for systems of linear equations and iterated projections for systems of linear inequalities. Expanding on a recent randomized iterated projection algorithm of Strohmer and Vershynin (Strohmer, T., R. Vershynin. 2009. A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl. 15 262–278) for systems of linear equations, we show that, under appropriate probability distributions, the linear rates of convergence (in expectation) can be bounded in terms of natural linear-algebraic condition numbers for the problems. We relate these condition measures to distances to ill-posedness and discuss generalizations to convex systems under metric regularity assumptions.
We establish the following result: If the graph of a lower semicontinuous real-extended-valued function $f:\mathbb{R} ^{n}\rightarrow\mathbb{R}\cup\{+\infty\}$ admits a Whitney stratification (so in particular if f is a semialgebraic function), then … We establish the following result: If the graph of a lower semicontinuous real-extended-valued function $f:\mathbb{R} ^{n}\rightarrow\mathbb{R}\cup\{+\infty\}$ admits a Whitney stratification (so in particular if f is a semialgebraic function), then the norm of the gradient of f at $x\in\mbox{dom\,}f$ relative to the stratum containing x bounds from below all norms of Clarke subgradients of f at x. As a consequence, we obtain a Morse–Sard type of theorem as well as a nonsmooth extension of the Kurdyka–Łojasiewicz inequality for functions definable in an arbitrary o-minimal structure. It is worthwhile pointing out that, even in a smooth setting, this last result generalizes the one given in [K. Kurdyka, Ann. Inst. Fourier (Grenoble), 48 (1998), pp. 769–783] by removing the boundedness assumption on the domain of the function.
The proximal gradient algorithm for minimizing the sum of a smooth and nonsmooth convex function often converges linearly even without strong convexity. One common reason is that a multiple of … The proximal gradient algorithm for minimizing the sum of a smooth and nonsmooth convex function often converges linearly even without strong convexity. One common reason is that a multiple of the step length at each iteration may linearly bound the “error”—the distance to the solution set. We explain the observed linear convergence intuitively by proving the equivalence of such an error bound to a natural quadratic growth condition. Our approach generalizes to linear and quadratic convergence analysis for proximal methods (of Gauss-Newton type) for minimizing compositions of nonsmooth functions with smooth mappings. We observe incidentally that short step-lengths in the algorithm indicate near-stationarity, suggesting a reliable termination criterion.
We prove that if two smooth manifolds intersect transversally, then the method of alternating projections converges locally at a linear rate. We bound the speed of convergence in terms of … We prove that if two smooth manifolds intersect transversally, then the method of alternating projections converges locally at a linear rate. We bound the speed of convergence in terms of the angle between the manifolds, which in turn we relate to the modulus of metric regularity for the intersection problem, a natural measure of conditioning. We discuss a variety of problem classes where the projections are computationally tractable, and we illustrate the method numerically on a problem of finding a low-rank solution of a matrix equation.
This paper considers the minimization of a convex integral functional over the positive cone of an $L_p $ space, subject to a finite number of linear equality constraints. Such problems … This paper considers the minimization of a convex integral functional over the positive cone of an $L_p $ space, subject to a finite number of linear equality constraints. Such problems arise in spectral estimation, where the bjective function is often entropy-like, and in constrained approximation. The Lagrangian dual problem is finite-dimensional and unconstrained. Under a quasi-interior constraint qualification, the primal and dual values are equal, with dual attainment. Examples show the primal value may not be attained. Conditions are given that ensure that the primal optimal solution can be calculated directly from a dual optimum. These conditions are satisfied in many examples.
Metric regularity is a central concept in variational analysis for the study of solution mappings associated with “generalized equations”, including variational inequalities and parameterized constraint systems. Here it is employed … Metric regularity is a central concept in variational analysis for the study of solution mappings associated with “generalized equations”, including variational inequalities and parameterized constraint systems. Here it is employed to characterize the distance to irregularity or infeasibility with respect to perturbations of the system structure. Generalizations of the Eckart-Young theorem in numerical analysis are obtained in particular.
There is growing interest in optimization problems with real symmetric matrices as variables. Generally the matrix functions involved are spectral: they depend only on the eigenvalues of the matrix. It … There is growing interest in optimization problems with real symmetric matrices as variables. Generally the matrix functions involved are spectral: they depend only on the eigenvalues of the matrix. It is known that convex spectral functions can be characterized exactly as symmetric convex functions of the eigenvalues. A new approach to this characterization is given, via a simple Fenchel conjugacy formula. We then apply this formula to derive expressions for subdifferentials, and to study duality relationships for convex optimization problems with positive semidefinite matrices as variables. Analogous results hold for Hermitian matrices.
Optimization problems involving eigenvalues arise in many different mathematical disciplines. This article is divided into two parts. Part I gives a historical account of the development of the field. We … Optimization problems involving eigenvalues arise in many different mathematical disciplines. This article is divided into two parts. Part I gives a historical account of the development of the field. We discuss various applications that have been especially influential, from structural analysis to combinatorial optimization, and we survey algorithmic developments, including the recent advance of interior-point methods for a specific problem class: semidefinite programming. In Part II we primarily address optimization of convex functions of eigenvalues of symmetric matrices subject to linear constraints. We derive a fairly complete mathematical theory, some of it classical and some of it new. Using the elegant language of conjugate duality theory, we highlight the parallels between the analysis of invariant matrix norms and weakly invariant convex matrix functions. We then restrict our attention further to linear and semidefinite programming, emphasizing the parallel duality theory and comparing primal-dual interior-point methods for the two problem classes. The final section presents some apparently new variational results about eigenvalues of nonsymmetric matrices, unifying known characterizations of the spectral abscissa (related to Lyapunov theory) and the spectral radius (as an infimum of matrix norms).
Nonsmoothness pervades optimization, but the way it typically arises is highly structured. Nonsmooth behavior of an objective function is usually associated, locally, with an active manifold: on this manifold the … Nonsmoothness pervades optimization, but the way it typically arises is highly structured. Nonsmooth behavior of an objective function is usually associated, locally, with an active manifold: on this manifold the function is smooth, whereas in normal directions it is "vee-shaped." Active set ideas in optimization depend heavily on this structure. Important examples of such functions include the pointwise maximum of some smooth functions and the maximum eigenvalue of a parametrized symmetric matrix. Among possible foundations for practical nonsmooth optimization, this broad class of "partly smooth" functions seems a promising candidate, enjoying a powerful calculus and sensitivity theory. In particular, we show under a natural regularity condition that critical points of partly smooth functions are stable: small perturbations to the function cause small movements of the critical point on the active manifold.
Given a finite number of moments of an unknown density $\bar x$ on a finite measure space, the best entropy estimate—that nonnegative density x with the given moments which minimizes … Given a finite number of moments of an unknown density $\bar x$ on a finite measure space, the best entropy estimate—that nonnegative density x with the given moments which minimizes the Boltzmann–Shannon entropy $I(x): = \int {x\log x} $—is considered. A direct proof is given that I has the Kadec property in $L_1 $—if $y_n $ converges weakly to $\bar y$ and $I(y_n )$ converges to $I(\bar y)$, then $y_n $ converges to $\bar y$ in norm. As a corollary, it is obtained that, as the number of given moments increases, the best entropy estimates converge in $L_1 $ norm to the best entropy estimate of the limiting problem, which is simply $\bar x$ in the determined case. Furthermore, for classical moment problems on intervals with $\bar x$ strictly positive and sufficiently smooth, error bounds and uniform convergence are actually obtained.
Nonsmooth variational analysis and related computational methods are powerful tools that can be effectively applied to identify local minimizers of nonconvex optimization problems arising in fixed-order controller design. We support … Nonsmooth variational analysis and related computational methods are powerful tools that can be effectively applied to identify local minimizers of nonconvex optimization problems arising in fixed-order controller design. We support this claim by applying nonsmooth analysis and methods to a challenging "Belgian chocolate" stabilization problem posed in 1994: find a stable, minimum phase, rational controller that stabilizes a specified second-order plant. Although easily stated, this particular problem remained unsolved until 2002, when a solution was found using an eleventh-order controller. Our computational methods find a stabilizing third-order controller without difficulty, suggesting explicit formulas for the controller and for the closed loop system, which has only one pole with multiplicity 5. Furthermore, our analytical techniques prove that this controller is locally optimal in the sense that there is no nearby controller with the same order for which the closed loop system has all its poles further left in the complex plane. Although the focus of the paper is stabilization, once a stabilizing controller is obtained, the same computational techniques can be used to optimize various measures of the closed loop system, including its complex stability radius or H <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">infin</sub> performance
A spectral function of a Hermitian matrix X is a function which depends only on the eigenvalues of X, λ 1 (X) ≥ λ 2 (X) ≥ ⋯ ≥ λ … A spectral function of a Hermitian matrix X is a function which depends only on the eigenvalues of X, λ 1 (X) ≥ λ 2 (X) ≥ ⋯ ≥ λ n (X), and hence may be written f(λ 1 (X), λ 2 (X), …, λ n (X)) for some symmetric function f. Such functions appear in a wide variety of matrix optimization problems. We give a simple proof that this spectral function is differentiable at X if and only if the function f is differentiable at the vector λ(X), and we give a concise formula for the derivative. We then apply this formula to deduce an analogous expression for the Clarke generalized gradient of the spectral function. A similar result holds for real symmetric matrices.
In 1958 Lax conjectured that hyperbolic polynomials in three variables are determinants of linear combinations of three symmetric matrices. This conjecture is equivalent to a recent observation of Helton and … In 1958 Lax conjectured that hyperbolic polynomials in three variables are determinants of linear combinations of three symmetric matrices. This conjecture is equivalent to a recent observation of Helton and Vinnikov.
Abstract A homogeneous real polynomial p is hyperbolic with respect to a given vector d if the univariate polynomial t ⟼ p(x − td) has all real roots for all … Abstract A homogeneous real polynomial p is hyperbolic with respect to a given vector d if the univariate polynomial t ⟼ p(x − td) has all real roots for all vectors x . Motivated by partial differential equations, Gårding proved in 1951 that the largest such root is a convex function of x , and showed various ways of constructing new hyperbolic polynomials. We present a powerful new such construction, and use it to generalize Gårding’s result to arbitrary symmetric functions of the roots. Many classical and recent inequalities follow easily. We develop various convex-analytic tools for such symmetric functions, of interest in interior-point methods for optimization problems over related cones.
Best entropy estimation is a technique that has been widely applied in many areas of science. It consists of estimating an unknown density from some of its moments by maximizing … Best entropy estimation is a technique that has been widely applied in many areas of science. It consists of estimating an unknown density from some of its moments by maximizing some measure of the entropy of the estimate. This problem can be modelled as a partially-finite convex program, with an integrable function as the variable. A complete duality and existence theory is developed for this problem and for an associated extended problem which allows singular, measure-theoretic solutions. This theory explains the appearance of singular components observed in the literature when the Burg entropy is used.It also provides a unified treatment of existence conditions when the Burg, Boltzmann-Shannon, or some other entropy is used as the objective. Some examples are discussed.
Many interesting real functions on Euclidean space are differentiable almost everywhere. All Lipschitz functions have this property, but so, for example, does the spectral abscissa of a matrix (a non-Lipschitz … Many interesting real functions on Euclidean space are differentiable almost everywhere. All Lipschitz functions have this property, but so, for example, does the spectral abscissa of a matrix (a non-Lipschitz function). In practice, the gradient is often easy to compute. We investigate to what extent we can approximate the Clarke subdifferential of such a function at some point by calculating the convex hull of some gradients sampled at random nearby points.
Dykstra's algorithm and the method of cyclic Bregman projections are often employed to solve best approximation and convex feasibility problems, which are fundamental in mathematics and the physical sciences. Censor … Dykstra's algorithm and the method of cyclic Bregman projections are often employed to solve best approximation and convex feasibility problems, which are fundamental in mathematics and the physical sciences. Censor and Reich very recently suggested a synthesis of these methods, Dykstra's algorithm with Bregman projections, to tackle a non-orthogonal best approximation problem, They obtained convergence when each constraint is a halfspace. It is shown here that this new algorithm works for general closed convex constraints; this complements Censor and Reich's result and relates to a framework by Tseng. The proof rests on Boyle and Dykstra's original work and on strong properties of Bregman distances corresponding to Legendre functions. Special cases and observations simplifying the implementation of the algorithm are aiso discussed
We prove that uniform second-order growth, tilt stability, and strong metric regularity of the subdifferential---three notions that have appeared in entirely different settings---are all essentially equivalent for any lower-semicontinuous, extended … We prove that uniform second-order growth, tilt stability, and strong metric regularity of the subdifferential---three notions that have appeared in entirely different settings---are all essentially equivalent for any lower-semicontinuous, extended real-valued function.
A function F on the space of n × n real symmetric matrices is called spectral if it depends only on the eigenvalues of its argument. Spectral functions are just … A function F on the space of n × n real symmetric matrices is called spectral if it depends only on the eigenvalues of its argument. Spectral functions are just symmetric functions of the eigenvalues. We show that a spectral function is twice (continuously) differentiable at a matrix if and only if the corresponding symmetric function is twice (continuously) differentiable at the vector of eigenvalues. We give a concise and usable formula for the Hessian.
Journal Article Robust stability and a criss‐cross algorithm for pseudospectra Get access J. V. Burke, J. V. Burke Search for other works by this author on: Oxford Academic Google Scholar … Journal Article Robust stability and a criss‐cross algorithm for pseudospectra Get access J. V. Burke, J. V. Burke Search for other works by this author on: Oxford Academic Google Scholar A. S. Lewis, A. S. Lewis Search for other works by this author on: Oxford Academic Google Scholar M. L. Overton M. L. Overton Search for other works by this author on: Oxford Academic Google Scholar IMA Journal of Numerical Analysis, Volume 23, Issue 3, July 2003, Pages 359–375, https://doi.org/10.1093/imanum/23.3.359 Published: 01 July 2003
We compare two recent variational-analytic approaches to second-order conditions and sensitivity analysis for nonsmooth optimization. We describe a broad setting where computing the generalized Hessian of Mordukhovich is easy. In … We compare two recent variational-analytic approaches to second-order conditions and sensitivity analysis for nonsmooth optimization. We describe a broad setting where computing the generalized Hessian of Mordukhovich is easy. In this setting, the idea of tilt stability introduced by Poliquin and Rockafellar is equivalent to a classical smooth second-order condition.
Certain interesting classes of functions on a real inner product space are invariant under an associated group of orthogonal linear transformations. This invariance can be made explicit via a simple … Certain interesting classes of functions on a real inner product space are invariant under an associated group of orthogonal linear transformations. This invariance can be made explicit via a simple decomposition. For example, rotationally invariant functions on ${\bf R}^2 $ are just even functions of the Euclidean norm, and functions on the Hermitian matrices (with trace inner product) which are invariant under unitary similarity transformations are just symmetric functions of the eigenvalues. We develop a framework for answering geometric and analytic (both classical and nonsmooth) questions about such a function by answering the corresponding question for the (much simpler) function appearing in the decomposition. The aim is to understand and extend the foundations of eigenvalue optimization, matrix approximation, and semidefinite programming.
The $\epsilon$-pseudospectrum of a matrix A is the subset of the complex plane consisting of all eigenvalues of all complex matrices within a distance $\epsilon$ of A. We are interested … The $\epsilon$-pseudospectrum of a matrix A is the subset of the complex plane consisting of all eigenvalues of all complex matrices within a distance $\epsilon$ of A. We are interested in two aspects of "optimization and pseudospectra." The first concerns maximizing the function "real part" over an $\epsilon$-pseudospectrum of a fixed matrix: this defines a function known as the $\epsilon$-pseudospectral abscissa of a matrix. We present a bisection algorithm to compute this function. Our second interest is in minimizing the $\epsilon$-pseudospectral abscissa over a set of feasible matrices. A prerequisite for local optimization of this function is an understanding of its variational properties, the study of which is the main focus of the paper. We show that, in a neighborhood of any nonderogatory matrix, the $\epsilon$-pseudospectral abscissa is a nonsmooth but locally Lipschitz and subdifferentially regular function for sufficiently small $\epsilon$; in fact, it can be expressed locally as the maximum of a finite number of smooth functions. Along the way we obtain an eigenvalue perturbation result: near a nonderogatory matrix, the eigenvalues satisfy a Hölder continuity property on matrix space---a property that is well known when only a single perturbation parameter is considered. The pseudospectral abscissa is a powerful modeling tool: not only is it a robust measure of stability, but it also reveals the transient (as opposed to asymptotic) behavior of associated dynamical systems.
We study the problem of estimating a nonnegative density, given a finite number of moments. Such problems arise in numerous practical applications. As the number of moments increases, the estimates … We study the problem of estimating a nonnegative density, given a finite number of moments. Such problems arise in numerous practical applications. As the number of moments increases, the estimates will always converge weak<inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="Superscript asterisk"> <mml:semantics> <mml:msup> <mml:mi /> <mml:mo>∗<!-- ∗ --></mml:mo> </mml:msup> <mml:annotation encoding="application/x-tex">^\ast</mml:annotation> </mml:semantics> </mml:math> </inline-formula> as measures, but need not converge weakly in <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="upper L 1"> <mml:semantics> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:msub> <mml:mi>L</mml:mi> <mml:mn>1</mml:mn> </mml:msub> </mml:mrow> <mml:annotation encoding="application/x-tex">{L_1}</mml:annotation> </mml:semantics> </mml:math> </inline-formula>. This is related to the existence of functions on a compact metric space which are not essentially Riemann integrable (in some suitable sense). We characterize the type of weak convergence we can expect in terms of Riemann integrability, and in some cases give error bounds. When the estimates are chosen to minimize an objective function with weakly compact level sets (such as the Bolzmann-Shannon entropy) they will converge weakly in <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="upper L 1"> <mml:semantics> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:msub> <mml:mi>L</mml:mi> <mml:mn>1</mml:mn> </mml:msub> </mml:mrow> <mml:annotation encoding="application/x-tex">{L_1}</mml:annotation> </mml:semantics> </mml:math> </inline-formula>. When an <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="upper L Subscript p"> <mml:semantics> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:msub> <mml:mi>L</mml:mi> <mml:mi>p</mml:mi> </mml:msub> </mml:mrow> <mml:annotation encoding="application/x-tex">{L_p}</mml:annotation> </mml:semantics> </mml:math> </inline-formula> norm <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="left-parenthesis 1 greater-than p greater-than normal infinity right-parenthesis"> <mml:semantics> <mml:mrow> <mml:mo stretchy="false">(</mml:mo> <mml:mn>1</mml:mn> <mml:mo>&gt;</mml:mo> <mml:mi>p</mml:mi> <mml:mo>&gt;</mml:mo> <mml:mi mathvariant="normal">∞<!-- ∞ --></mml:mi> <mml:mo stretchy="false">)</mml:mo> </mml:mrow> <mml:annotation encoding="application/x-tex">(1 &gt; p &gt; \infty )</mml:annotation> </mml:semantics> </mml:math> </inline-formula> is used as the objective, the estimates actually converge in norm. These results provide theoretical support to the growing popularity of such methods in practice.
We show that 2-norm pseudospectra of m-by-n matrices have no more than 2m(4m - 1) connected components. Such bounds are pertinent for computing the distance to uncontrollability of a control … We show that 2-norm pseudospectra of m-by-n matrices have no more than 2m(4m - 1) connected components. Such bounds are pertinent for computing the distance to uncontrollability of a control system, since this distance is the minimum value of a function whose level sets are pseudospectra. We also discuss algorithms for computing this distance, including a trisection variant of Gu's recent algorithm, and we show how these may be used to locally maximize the distance to uncontrollability for a parameterized system.
Given an affine subspace of square matrices, we consider the problem of minimizing the spectral abscissa (the largest real part of an eigenvalue). We give an example whose optimal solution … Given an affine subspace of square matrices, we consider the problem of minimizing the spectral abscissa (the largest real part of an eigenvalue). We give an example whose optimal solution has Jordan form consisting of a single Jordan block, and we show, using nonlipschitz variational analysis, that this behaviour persists under arbitrary small perturbations to the example. Thus although matrices with nontrivial Jordan structure are rare in the space of all matrices, they appear naturally in spectral abscissa minimization.
Many of the primal ingredients of convex optimization extend naturally from Euclidean to Hadamard spaces $\unicode{x2014}$ nonpositively curved metric spaces like Euclidean, Hilbert, and hyperbolic spaces, metric trees, and more … Many of the primal ingredients of convex optimization extend naturally from Euclidean to Hadamard spaces $\unicode{x2014}$ nonpositively curved metric spaces like Euclidean, Hilbert, and hyperbolic spaces, metric trees, and more general CAT(0) cubical complexes. Linear structure, however, and the duality theory it supports are absent. Nonetheless, we introduce a new type of subgradient for convex functions on Hadamard spaces, based on Busemann functions. This notion supports a splitting subgradient method with guaranteed complexity bounds. In particular, the algorithm solves $p$-mean problems in general Hadamard spaces: we illustrate by computing medians in BHV tree space.
Variational analysis presents a unified theory encompassing in particular both smoothness and convexity. In a Euclidean space, convex sets and smooth manifolds both have straightforward local geometry. However, in the … Variational analysis presents a unified theory encompassing in particular both smoothness and convexity. In a Euclidean space, convex sets and smooth manifolds both have straightforward local geometry. However, in the most basic hybrid case of feasible regions consisting of pre-images of convex sets under maps that are once (but not necessarily twice) continuously differentiable, the geometry is less transparent. We define a new approximate convexity property, that holds both for such feasible regions and also for all prox-regular sets. This new property requires that nearby points can always be joined by smooth feasible paths that are almost straight. In particular, in the terminology of real algebraic geometry, such feasible regions are locally normally embedded in the Euclidean space.
Geodesic metric spaces support a variety of averaging constructions for given finite sets. Computing such averages has generated extensive interest in diverse disciplines. Here we consider the inverse problem of … Geodesic metric spaces support a variety of averaging constructions for given finite sets. Computing such averages has generated extensive interest in diverse disciplines. Here we consider the inverse problem of recognizing computationally whether or not a given point is such an average, exactly or approximately. In nonpositively curved spaces, several averaging notions, including the usual weighted barycenter, produce the same "mean set". In such spaces, at points where the tangent cone is a Euclidean space, the recognition problem reduces to Euclidean projection onto a polytope. Hadamard manifolds comprise one example. Another consists of CAT(0) cubical complexes, at relative-interior points: the recognition problem is harder for general points, but we present an efficient semidefinite-programming-based algorithm.
Goldstein's 1977 idealized iteration for minimizing a Lipschitz objective fixes a distance - the step size - and relies on a certain approximate subgradient. That "Goldstein subgradient" is the shortest … Goldstein's 1977 idealized iteration for minimizing a Lipschitz objective fixes a distance - the step size - and relies on a certain approximate subgradient. That "Goldstein subgradient" is the shortest convex combination of objective gradients at points within that distance of the current iterate. A recent implementable Goldstein-style algorithm allows a remarkable complexity analysis (Zhang et al. 2020), and a more sophisticated variant (Davis and Jiang, 2022) leverages typical objective geometry to force near-linear convergence. To explore such methods, we introduce a new modulus, based on Goldstein subgradients, that robustly measures the slope of a Lipschitz function. We relate near-linear convergence of Goldstein-style methods to linear growth of this modulus at minimizers. We illustrate the idea computationally with a simple heuristic for Lipschitz minimization.
We consider geodesically convex optimization problems involving distances to a finite set of points $A$ in a CAT(0) cubical complex. Examples include the minimum enclosing ball problem, the weighted mean … We consider geodesically convex optimization problems involving distances to a finite set of points $A$ in a CAT(0) cubical complex. Examples include the minimum enclosing ball problem, the weighted mean and median problems, and the feasibility and projection problems for intersecting balls with centers in $A$. We propose a decomposition approach relying on standard Euclidean cutting plane algorithms. The cutting planes are readily derivable from efficient algorithms for computing geodesics in the complex.
To explore convex optimization on Hadamard spaces, we consider an iteration in the style of a subgradient algorithm. Traditionally, such methods assume that the underlying spaces are manifolds and that … To explore convex optimization on Hadamard spaces, we consider an iteration in the style of a subgradient algorithm. Traditionally, such methods assume that the underlying spaces are manifolds and that the objectives are geodesically convex: the methods are described using tangent spaces and exponential maps. By contrast, our iteration applies in a general Hadamard space, is framed in the underlying space itself, and relies instead on horospherical convexity of the objective level sets. For this restricted class of objectives, we prove a complexity result of the usual form. Notably, the complexity does not depend on a lower bound on the space curvature.
.Differentiable structure ensures that many of the basics of classical convex analysis extend naturally from Euclidean space to Riemannian manifolds. Without such structure, however, extensions are more challenging. Nonetheless, in … .Differentiable structure ensures that many of the basics of classical convex analysis extend naturally from Euclidean space to Riemannian manifolds. Without such structure, however, extensions are more challenging. Nonetheless, in Alexandrov spaces with curvature bounded above (but possibly positive), we develop several basic building blocks. We define subgradients via projection and the normal cone, prove their existence, and relate them to the classical affine minorant property. Then, in what amounts to a simple calculus or duality result, we develop a necessary optimality condition for minimizing the sum of two convex functions.Keywordssubdifferentialnormal coneAlexandrov spacesMSC codes65K1053C20
We study the impact of nonconvexity on the complexity of nonsmooth optimization, emphasizing objectives such as piecewise linear functions, which may not be weakly convex. We focus on a dimension-independent … We study the impact of nonconvexity on the complexity of nonsmooth optimization, emphasizing objectives such as piecewise linear functions, which may not be weakly convex. We focus on a dimension-independent analysis, slightly modifying a 2020 black-box algorithm of Zhang-Lin-Jegelka-Sra-Jadbabaie that approximates an ϵ-stationary point of any directionally differentiable Lipschitz objective using [Formula: see text] calls to a specialized subgradient oracle and a randomized line search. Seeking by contrast a deterministic method, we present a simple black-box version that achieves [Formula: see text] for any difference-of-convex objective and [Formula: see text] for the weakly convex case. Our complexity bound depends on a natural nonconvexity modulus that is related, intriguingly, to the negative part of directional second derivatives of the objective, understood in the distributional sense. Funding: This work was supported by the National Science Foundation [Grant DMS-2006990].
.For strongly convex objectives that are smooth, the classical theory of gradient descent ensures linear convergence relative to the number of gradient evaluations. An analogous nonsmooth theory is challenging. Even … .For strongly convex objectives that are smooth, the classical theory of gradient descent ensures linear convergence relative to the number of gradient evaluations. An analogous nonsmooth theory is challenging. Even when the objective is smooth at every iterate, the corresponding local models are unstable, and the number of cutting planes invoked by traditional remedies is difficult to bound, leading to convergence guarantees that are sublinear relative to the cumulative number of gradient evaluations. We instead propose a multipoint generalization of the gradient descent iteration for local optimization. While our iteration was designed with general objectives in mind, we are motivated by a "max-of-smooth" model that captures the subdifferential dimension at optimality. We prove linear convergence when the objective is itself max-of-smooth, and experiments suggest a more general phenomenon.Keywordsfirst-order methodnonsmooth optimizationgradient descentlocal linear convergenceMSC codes90C2565K0549M37
Differentiable structure ensures that many of the basics of classical convex analysis extend naturally from Euclidean space to Riemannian manifolds. Without such structure, however, extensions are more challenging. Nonetheless, in … Differentiable structure ensures that many of the basics of classical convex analysis extend naturally from Euclidean space to Riemannian manifolds. Without such structure, however, extensions are more challenging. Nonetheless, in Alexandrov spaces with curvature bounded above (but possibly positive), we develop several basic building blocks. We define subgradients via projection and the normal cone, prove their existence, and relate them to the classical affine minorant property. Then, in what amounts to a simple calculus or duality result, we develop a necessary optimality condition for minimizing the sum of two convex functions.
A central tool for understanding first-order optimization algorithms is the Kurdyka-Lojasiewicz inequality. Standard approaches to such methods rely crucially on this inequality to leverage sufficient decrease conditions involving gradients or … A central tool for understanding first-order optimization algorithms is the Kurdyka-Lojasiewicz inequality. Standard approaches to such methods rely crucially on this inequality to leverage sufficient decrease conditions involving gradients or subgradients. However, the KL property fundamentally concerns not subgradients but rather "slope", a purely metric notion. By highlighting this view, and avoiding any use of subgradients, we present a simple and concise complexity analysis for first-order optimization algorithms on metric spaces. This subgradient-free perspective also frames a short and focused proof of the KL property for nonsmooth semi-algebraic functions.
We consider the popular and classical method of alternating projections for finding a point in the intersection of two closed sets. By situating the algorithm in a metric space, equipped … We consider the popular and classical method of alternating projections for finding a point in the intersection of two closed sets. By situating the algorithm in a metric space, equipped only with well-behaved geodesics and angles (in the sense of Alexandrov), we are able to highlight the two key geometric ingredients in a standard intuitive analysis of local linear convergence. The first is a transversality-like condition on the intersection; the second is a convexity-like condition on one set: "uniform approximation by geodesics."
In optimization, the notion of a partly smooth objective function is powerful for applications in algorithmic convergence and postoptimality analysis, and yet is complex to define. A shift in focus … In optimization, the notion of a partly smooth objective function is powerful for applications in algorithmic convergence and postoptimality analysis, and yet is complex to define. A shift in focus to the first-order optimality conditions reduces the concept to a simple constant-rank condition. In this view, partial smoothness extends to more general variational systems, encompassing in particular the saddlepoint operators underlying popular primal-dual splitting algorithms. For a broad class of semi-algebraic generalized equations, partial smoothness holds generically.
Identifiability, and the closely related idea of partial smoothness, unify classical active set methods and more general notions of solution structure. Diverse optimization algorithms generate iterates in discrete time that … Identifiability, and the closely related idea of partial smoothness, unify classical active set methods and more general notions of solution structure. Diverse optimization algorithms generate iterates in discrete time that are eventually confined to identifiable sets. We present two fresh perspectives on identifiability. The first distills the notion to a simple metric property, applicable not just in Euclidean settings but to optimization over manifolds and beyond; the second reveals analogous continuous-time behavior for subgradient descent curves. The Kurdya-Lojasiewicz property typically governs convergence in both discrete and continuous time: we explore its interplay with identifiability.
We study the impact of nonconvexity on the complexity of nonsmooth optimization, emphasizing objectives such as piecewise linear functions, which may not be weakly convex. We focus on a dimension-independent … We study the impact of nonconvexity on the complexity of nonsmooth optimization, emphasizing objectives such as piecewise linear functions, which may not be weakly convex. We focus on a dimension-independent analysis, slightly modifying a black-box algorithm of Zhang et al. (2020) that approximates an $ε$-stationary point of any directionally differentiable Lipschitz objective using $O(ε^{-4})$ calls to a specialized subgradient oracle and a randomized line search. Our simple black-box deterministic version, achieves $O(ε^{-5})$ for any difference-of-convex objective, and $O(ε^{-4})$ for the weakly convex case. Our complexity bound depends on a natural nonconvexity modulus, related, intriguingly, to the negative part of directional second derivatives of the objective, understood in the distributional sense.
The classical Clarke subdifferential alone is inadequate for understanding automatic differentiation in nonsmooth contexts. Instead, we can sometimes rely on enlarged generalized gradients called "conservative fields", defined through the natural … The classical Clarke subdifferential alone is inadequate for understanding automatic differentiation in nonsmooth contexts. Instead, we can sometimes rely on enlarged generalized gradients called "conservative fields", defined through the natural path-wise chain rule: one application is the convergence analysis of gradient-based deep learning algorithms. In the semi-algebraic case, we show that all conservative fields are in fact just Clarke subdifferentials plus normals of manifolds in underlying Whitney stratifications.
The classical Clarke subdifferential alone is inadequate for understanding automatic differentiation in nonsmooth contexts. Instead, we can sometimes rely on enlarged generalized gradients called "conservative fields," defined through the natural … The classical Clarke subdifferential alone is inadequate for understanding automatic differentiation in nonsmooth contexts. Instead, we can sometimes rely on enlarged generalized gradients called "conservative fields," defined through the natural pathwise chain rule: one application is the convergence analysis of gradient-based deep learning algorithms. In the semialgebraic case, we show that all conservative fields are in fact just Clarke subdifferentials plus normals of manifolds in underlying Whitney stratifications.
For strongly convex objectives that are smooth, the classical theory of gradient descent ensures linear convergence relative to the number of gradient evaluations. An analogous nonsmooth theory is challenging. Even … For strongly convex objectives that are smooth, the classical theory of gradient descent ensures linear convergence relative to the number of gradient evaluations. An analogous nonsmooth theory is challenging. Even when the objective is smooth at every iterate, the corresponding local models are unstable and the number of cutting planes invoked by traditional remedies is difficult to bound, leading to convergences guarantees that are sublinear relative to the cumulative number of gradient evaluations. We instead propose a multipoint generalization of the gradient descent iteration for local optimization. While designed with general objectives in mind, we are motivated by a ``max-of-smooth'' model that captures the subdifferential dimension at optimality. We prove linear convergence when the objective is itself max-of-smooth, and experiments suggest a more general phenomenon.
Diverse optimization algorithms correctly identify, in finite time, intrinsic constraints that must be active at optimality. Analogous behavior extends beyond optimization to systems involving partly smooth operators, and in particular … Diverse optimization algorithms correctly identify, in finite time, intrinsic constraints that must be active at optimality. Analogous behavior extends beyond optimization to systems involving partly smooth operators, and in particular to variational inequalities over partly smooth sets. As in classical nonlinear programming, such active‐set structure underlies the design of accelerated local algorithms of Newton type. We formalize this idea in broad generality as a simple linearization scheme for two intersecting manifolds.
Optimal matrices for problems involving the matrix numerical radius often have fields of values that are disks, a phenomenon associated with partial smoothness. Such matrices are highly structured: we experiment … Optimal matrices for problems involving the matrix numerical radius often have fields of values that are disks, a phenomenon associated with partial smoothness. Such matrices are highly structured: we experiment in particular with the proximal mapping for the radius, which often maps n-by-n random matrix inputs into a particular manifold of disk matrices that has real codimension 2n. The outputs, computed via semidefinite programming, also satisfy an unusual rank property at optimality.
Solutions to optimization problems involving the numerical radius often belong to the class of "disk matrices": those whose field of values is a circular disk in the complex plane centered … Solutions to optimization problems involving the numerical radius often belong to the class of "disk matrices": those whose field of values is a circular disk in the complex plane centered at zero. We investigate this phenomenon using the variational-analytic idea of partial smoothness. We give conditions under which the set of disk matrices is locally a manifold $\mathcal M$, with respect to which the numerical radius $r$ is partly smooth, implying that $r$ is smooth when restricted to $\mathcal M$ but strictly nonsmooth when restricted to lines transversal to $\mathcal M$. Consequently, minimizers of the numerical radius of a parametrized matrix often lie in $\mathcal M$. Partial smoothness holds, in particular, at $n\times n$ matrices with exactly $n-1$ nonzeros, all on the superdiagonal. On the other hand, in the real 18-dimensional vector space of complex $3\times 3$ matrices, the disk matrices comprise the closure of a semialgebraic manifold $\mathcal L$ with dimension 12, and the numerical radius is partly smooth with respect to $\mathcal L$.
Diverse optimization algorithms correctly identify, in finite time, intrinsic constraints that must be active at optimality. Analogous behavior extends beyond optimization to systems involving partly smooth operators, and in particular … Diverse optimization algorithms correctly identify, in finite time, intrinsic constraints that must be active at optimality. Analogous behavior extends beyond optimization to systems involving partly smooth operators, and in particular to variational inequalities over partly smooth sets. As in classical nonlinear programming, such active-set structure underlies the design of accelerated local algorithms of Newton type. We formalize this idea in broad generality as a simple linearization scheme for two intersecting manifolds.
Superlinear convergence has been an elusive goal for black-box nonsmooth optimization. Even in the convex case, the subgradient method is very slow, and while some cutting plane algorithms, including traditional … Superlinear convergence has been an elusive goal for black-box nonsmooth optimization. Even in the convex case, the subgradient method is very slow, and while some cutting plane algorithms, including traditional bundle methods, are popular in practice, local convergence is still sluggish. Faster variants depend either on problem structure or on analyses that elide sequences of "null" steps. Motivated by a semi-structured approach to optimization and the sequential quadratic programming philosophy, we describe a new bundle Newton method that incorporates second-order objective information with the usual linear approximation oracle. One representative problem class consists of maxima of several smooth functions, individually inaccessible to the oracle. Given as additional input just the cardinality of the optimal active set, we prove local quadratic convergence. A simple implementation shows promise on more general functions, both convex and nonconvex, and suggests first-order analogues.
Diverse optimization algorithms correctly identify, in finite time, intrinsic constraints that must be active at optimality. Analogous behavior extends beyond optimization to systems involving partly smooth operators, and in particular … Diverse optimization algorithms correctly identify, in finite time, intrinsic constraints that must be active at optimality. Analogous behavior extends beyond optimization to systems involving partly smooth operators, and in particular to variational inequalities over partly smooth sets. As in classical nonlinear programming, such active-set structure underlies the design of accelerated local algorithms of Newton type. We formalize this idea in broad generality as a simple linearization scheme for two intersecting manifolds.
Solutions to optimization problems involving the numerical radius often belong to a special class: the set of matrices having field of values a disk centered at the origin. After illustrating … Solutions to optimization problems involving the numerical radius often belong to a special class: the set of matrices having field of values a disk centered at the origin. After illustrating this phenomenon with some examples, we illuminate it by studying matrices around which this set of "disk matrices" is a manifold with respect to which the numerical radius is partly smooth. We then apply our results to matrices whose nonzeros consist of a single superdiagonal, such as Jordan blocks and the Crabb matrix related to a well-known conjecture of Crouzeix. Finally, we consider arbitrary complex three-by-three matrices; even in this case, the details are surprisingly intricate. One of our results is that in this real vector space with dimension 18, the set of disk matrices is a semi-algebraic manifold with dimension 12.
Solutions to optimization problems involving the numerical radius often belong to a special class: the set of having field of values a centered at the origin. After illustrating this phenomenon … Solutions to optimization problems involving the numerical radius often belong to a special class: the set of having field of values a centered at the origin. After illustrating this phenomenon with some examples, we illuminate it by studying around which this set of disk matrices is a manifold with respect to which the numerical radius is partly smooth. We then apply our results to whose nonzeros consist of a single superdiagonal, such as Jordan blocks and the Crabb matrix related to a well-known conjecture of Crouzeix. Finally, we consider arbitrary complex three-by-three matrices; even in this case, the details are surprisingly intricate. One of our results is that in this real vector space with dimension 18, the set of is a semi-algebraic manifold with dimension 12.
Given two arbitrary closed sets in Euclidean space, a simple transversality condition guarantees that the method of alternating projections converges locally, at linear rate, to a point in the intersection. … Given two arbitrary closed sets in Euclidean space, a simple transversality condition guarantees that the method of alternating projections converges locally, at linear rate, to a point in the intersection. Exact projection onto nonconvex sets is typically intractable, but we show that computationally-cheap inexact projections may suffice instead. In particular, if one set is defined by sufficiently regular smooth constraints, then projecting onto the approximation obtained by linearizing those constraints around the current iterate suffices. On the other hand, if one set is a smooth manifold represented through local coordinates, then the approximate projection resulting from linearizing the coordinate system around the preceding iterate on the manifold also suffices.
The proximal gradient algorithm for minimizing the sum of a smooth and nonsmooth convex function often converges linearly even without strong convexity. One common reason is that a multiple of … The proximal gradient algorithm for minimizing the sum of a smooth and nonsmooth convex function often converges linearly even without strong convexity. One common reason is that a multiple of the step length at each iteration may linearly bound the “error”—the distance to the solution set. We explain the observed linear convergence intuitively by proving the equivalence of such an error bound to a natural quadratic growth condition. Our approach generalizes to linear and quadratic convergence analysis for proximal methods (of Gauss-Newton type) for minimizing compositions of nonsmooth functions with smooth mappings. We observe incidentally that short step-lengths in the algorithm indicate near-stationarity, suggesting a reliable termination criterion.
The BFGS quasi-Newton methodology, popular for smooth minimization, has also proved surprisingly effective in nonsmooth optimization. Through a variety of simple examples and computational experiments, we explore how the BFGS … The BFGS quasi-Newton methodology, popular for smooth minimization, has also proved surprisingly effective in nonsmooth optimization. Through a variety of simple examples and computational experiments, we explore how the BFGS matrix update improves the local metric associated with a convex function even in the absence of smoothness and without using a line search. We compare the behavior of the BFGS and Shor r-algorithm updates.
The popular BFGS quasi-Newton minimization algorithm under reasonable conditions converges globally on smooth convex functions. This result was proved by Powell in a landmark 1976 paper: we consider its implications … The popular BFGS quasi-Newton minimization algorithm under reasonable conditions converges globally on smooth convex functions. This result was proved by Powell in a landmark 1976 paper: we consider its implications for functions that are not smooth. In particular, an analogous convergence result holds for functions (like the Euclidean norm) whose minimizers are isolated nonsmooth points.
Given two arbitrary closed sets in Euclidean space, a simple transversality condition guarantees that the method of alternating projections converges locally, at linear rate, to a point in the intersection. … Given two arbitrary closed sets in Euclidean space, a simple transversality condition guarantees that the method of alternating projections converges locally, at linear rate, to a point in the intersection. Exact projection onto nonconvex sets is typically intractable, but we show that computationally-cheap inexact projections may suffice instead. In particular, if one set is defined by sufficiently regular smooth constraints, then projecting onto the approximation obtained by linearizing those constraints around the current iterate suffices. On the other hand, if one set is a smooth manifold represented through local coordinates, then the approximate projection resulting from linearizing the coordinate system around the preceding iterate on the manifold also suffices.
The idea of partial smoothness in optimization blends certain smooth and nonsmooth properties of feasible regions and objective functions. As a consequence, the standard first-order conditions guarantee that diverse iterative … The idea of partial smoothness in optimization blends certain smooth and nonsmooth properties of feasible regions and objective functions. As a consequence, the standard first-order conditions guarantee that diverse iterative algorithms (and post-optimality analyses) identify active structure or constraints. However, by instead focusing directly on the first-order conditions, the formal concept of partial smoothness simplifies dramatically: in basic differential geometric language, it is just a constant-rank condition. In this view, partial smoothness extends to more general mappings, such as saddlepoint operators underlying primal-dual splitting algorithms.
This paper reviews the gradient sampling methodology for solving nonsmooth, nonconvex optimization problems. An intuitively straightforward gradient sampling algorithm is stated and its convergence properties are summarized. Throughout this discussion, … This paper reviews the gradient sampling methodology for solving nonsmooth, nonconvex optimization problems. An intuitively straightforward gradient sampling algorithm is stated and its convergence properties are summarized. Throughout this discussion, we emphasize the simplicity of gradient sampling as an extension of the steepest descent method for minimizing smooth objectives. We then provide overviews of various enhancements that have been proposed to improve practical performance, as well as of several extensions that have been made in the literature, such as to solve constrained problems. The paper also includes clarification of certain technical aspects of the analysis of gradient sampling algorithms, most notably related to the assumptions one needs to make about the set of points at which the objective is continuously differentiable. Finally, we discuss possible future research directions.
The popular BFGS quasi-Newton minimization algorithm under reasonable conditions converges globally on smooth convex functions. This result was proved by Powell in 1976: we consider its implications for functions that … The popular BFGS quasi-Newton minimization algorithm under reasonable conditions converges globally on smooth convex functions. This result was proved by Powell in 1976: we consider its implications for functions that are not smooth. In particular, an analogous convergence result holds for functions, like the Euclidean norm, that are nonsmooth at the minimizer.
We consider optimization algorithms that successively minimize simple Taylor-like models of the objective function. Methods of Gauss-Newton type for minimizing the composition of a convex function and a smooth map … We consider optimization algorithms that successively minimize simple Taylor-like models of the objective function. Methods of Gauss-Newton type for minimizing the composition of a convex function and a smooth map are common examples. Our main result is an explicit relationship between the step-size of any such algorithm and the slope of the function at a nearby point. Consequently, we (1) show that the step-sizes can be reliably used to terminate the algorithm, (2) prove that as long as the step-sizes tend to zero, every limit point of the iterates is stationary, and (3) show that conditions, akin to classical quadratic growth, imply that the step-sizes linearly bound the distance of the iterates to the solution set. The latter so-called error bound property is typically used to establish linear (or faster) convergence guarantees. Analogous results hold when the step-size is replaced by the square root of the decrease in the model's value. We complete the paper with extensions to when the models are minimized only inexactly.
The proximal gradient algorithm for minimizing the sum of a smooth and a nonsmooth convex function often converges linearly even without strong convexity. One common reason is that a multiple … The proximal gradient algorithm for minimizing the sum of a smooth and a nonsmooth convex function often converges linearly even without strong convexity. One common reason is that a multiple of the step length at each iteration may linearly bound the -- the distance to the solution set. We explain the observed linear convergence intuitively by proving the equivalence of such an error bound to a natural quadratic growth condition. Our approach generalizes to linear convergence analysis for proximal methods (of Gauss-Newton type) for minimizing compositions of nonsmooth functions with smooth mappings. We observe incidentally that short step-lengths in the algorithm indicate near-stationarity, suggesting a reliable termination criterion.
We present a theorem of Sard type for semialgebraic set-valued mappings whose graphs have dimension no larger than that of their range space: the inverse of such a mapping admits … We present a theorem of Sard type for semialgebraic set-valued mappings whose graphs have dimension no larger than that of their range space: the inverse of such a mapping admits a single-valued analytic localization around any pair in the graph, for a generic value parameter. This simple result yields a transparent and unified treatment of generic properties of semialgebraic optimization problems: “typical” semialgebraic problems have finitely many critical points, around each of which they admit a unique “active manifold” (analogue of an active set in nonlinear optimization); moreover, such critical points satisfy strict complementarity and second-order sufficient conditions for optimality are indeed necessary.
The proximal gradient algorithm for minimizing the sum of a smooth and a nonsmooth convex function often converges linearly even without strong convexity. One common reason is that a multiple … The proximal gradient algorithm for minimizing the sum of a smooth and a nonsmooth convex function often converges linearly even without strong convexity. One common reason is that a multiple of the step length at each iteration may linearly bound the "error" -- the distance to the solution set. We explain the observed linear convergence intuitively by proving the equivalence of such an error bound to a natural quadratic growth condition. Our approach generalizes to linear convergence analysis for proximal methods (of Gauss-Newton type) for minimizing compositions of nonsmooth functions with smooth mappings. We observe incidentally that short step-lengths in the algorithm indicate near-stationarity, suggesting a reliable termination criterion.
We consider optimization algorithms that successively minimize simple Taylor-like models of the objective function. Methods of Gauss-Newton type for minimizing the composition of a convex function and a smooth map … We consider optimization algorithms that successively minimize simple Taylor-like models of the objective function. Methods of Gauss-Newton type for minimizing the composition of a convex function and a smooth map are common examples. Our main result is an explicit relationship between the step-size of any such algorithm and the slope of the function at a nearby point. Consequently, we (1) show that the step-sizes can be reliably used to terminate the algorithm, (2) prove that as long as the step-sizes tend to zero, every limit point of the iterates is stationary, and (3) show that conditions, akin to classical quadratic growth, imply that the step-sizes linearly bound the distance of the iterates to the solution set. The latter so-called error bound property is typically used to establish linear (or faster) convergence guarantees. Analogous results hold when the step-size is replaced by the square root of the decrease in the model's value. We complete the paper with extensions to when the models are minimized only inexactly.
Steepest descent is central in variational mathematics. We present a new transparent existence proof for curves of near-maximal slope---an influential notion of steepest descent in a nonsmooth setting. The existence … Steepest descent is central in variational mathematics. We present a new transparent existence proof for curves of near-maximal slope---an influential notion of steepest descent in a nonsmooth setting. The existence theory is further amplified for semialgebraic functions, prototypical nonpathological functions in nonsmooth optimization: such functions always admit nontrivial descent curves emanating from any (even critical) nonminimizing point. We moreover show that curves of near-maximal slope of semialgebraic functions have a more classical description as solutions of subgradient dynamical systems.
We present a theorem of Sard type for semi-algebraic set-valued mappings whose graphs have dimension no larger than that of their range space: the inverse of such a mapping admits … We present a theorem of Sard type for semi-algebraic set-valued mappings whose graphs have dimension no larger than that of their range space: the inverse of such a mapping admits a single-valued analytic localization around any pair in the graph, for a generic value parameter. This simple result yields a transparent and unified treatment of generic properties of semi-algebraic optimization problems: "typical" semi-algebraic problems have finitely many critical points, around each of which they admit a unique "active manifold" (analogue of an active set in nonlinear optimization); moreover, such critical points satisfy strict complementarity and second-order sufficient conditions for optimality are indeed necessary.
Nonsmoothness pervades optimization, but the way it typically arises is highly structured. Nonsmooth behavior of an objective function is usually associated, locally, with an active manifold: on this manifold the … Nonsmoothness pervades optimization, but the way it typically arises is highly structured. Nonsmooth behavior of an objective function is usually associated, locally, with an active manifold: on this manifold the function is smooth, whereas in normal directions it is "vee-shaped." Active set ideas in optimization depend heavily on this structure. Important examples of such functions include the pointwise maximum of some smooth functions and the maximum eigenvalue of a parametrized symmetric matrix. Among possible foundations for practical nonsmooth optimization, this broad class of "partly smooth" functions seems a promising candidate, enjoying a powerful calculus and sensitivity theory. In particular, we show under a natural regularity condition that critical points of partly smooth functions are stable: small perturbations to the function cause small movements of the critical point on the active manifold.
The class of prox-regular functions covers all l.s.c., proper, convex functions, lower-$\mathcal {C}^{2}$ functions and strongly amenable functions, hence a large core of functions of interest in variational analysis and … The class of prox-regular functions covers all l.s.c., proper, convex functions, lower-$\mathcal {C}^{2}$ functions and strongly amenable functions, hence a large core of functions of interest in variational analysis and optimization. The subgradient mappings associated with prox-regular functions have unusually rich properties, which are brought to light here through the study of the associated Moreau envelope functions and proximal mappings. Connections are made between second-order epi-derivatives of the functions and proto-derivatives of their subdifferentials. Conditions are identified under which the Moreau envelope functions are convex or strongly convex, even if the given functions are not.
There is growing interest in optimization problems with real symmetric matrices as variables. Generally the matrix functions involved are spectral: they depend only on the eigenvalues of the matrix. It … There is growing interest in optimization problems with real symmetric matrices as variables. Generally the matrix functions involved are spectral: they depend only on the eigenvalues of the matrix. It is known that convex spectral functions can be characterized exactly as symmetric convex functions of the eigenvalues. A new approach to this characterization is given, via a simple Fenchel conjugacy formula. We then apply this formula to derive expressions for subdifferentials, and to study duality relationships for convex optimization problems with positive semidefinite matrices as variables. Analogous results hold for Hermitian matrices.
The concept of a "class-$C^p $ identifiable surface" of a convex set in Euclidean space is introduced. The paper shows how the smoothness of these surfaces is related to the … The concept of a "class-$C^p $ identifiable surface" of a convex set in Euclidean space is introduced. The paper shows how the smoothness of these surfaces is related to the smoothness of the projection operator and presents finite identification results for certain algorithms for minimization of a function over this set. The work uses a partially geometric view of constrained optimization to generalize previous finite identification results.
Let f be a continuous function on $\Rl^n$, and suppose f is continuously differentiable on an open dense subset. Such functions arise in many applications, and very often minimizers are … Let f be a continuous function on $\Rl^n$, and suppose f is continuously differentiable on an open dense subset. Such functions arise in many applications, and very often minimizers are points at which f is not differentiable. Of particular interest is the case where f is not convex, and perhaps not even locally Lipschitz, but is a function whose gradient is easily computed where it is defined. We present a practical, robust algorithm to locally minimize such functions, based on gradient sampling. No subgradient information is required by the algorithm. When f is locally Lipschitz and has bounded level sets, and the sampling radius $\eps$ is fixed, we show that, with probability 1, the algorithm generates a sequence with a cluster point that is Clarke $\eps$-stationary. Furthermore, we show that if f has a unique Clarke stationary point $\bar x$, then the set of all cluster points generated by the algorithm converges to $\bar x$ as $\eps$ is reduced to zero. Numerical results are presented demonstrating the robustness of the algorithm and its applicability in a wide variety of contexts, including cases where f is not locally Lipschitz at minimizers. We report approximate local minimizers for functions in the applications literature which have not, to our knowledge, been obtained previously. When the termination criteria of the algorithm are satisfied, a precise statement about nearness to Clarke $\eps$-stationarity is available. A MATLAB implementation of the algorithm is posted at http://www.cs.nyu.edu/overton/papers/gradsamp/alg.
The word “tame” is used in the title in the same context as in expressions like “convex optimization,” “nonsmooth optimization,” etc.—as a reference to the class of objects involved in … The word “tame” is used in the title in the same context as in expressions like “convex optimization,” “nonsmooth optimization,” etc.—as a reference to the class of objects involved in the formulation of optimization problems. Definable and tame functions and mappings associated with various o-minimal structures (e.g. semilinear, semialgebraic, globally subanalytic, and others) have a number of remarkable properties which make them an attractive domain for various applications. This relates both to the power of results that can be obtained and the power of available analytic techniques. The paper surveys certain ideas and recent results, some new, which have been or (hopefully) can be productively used in studies relating to variational analysis and nonsmooth optimization.
In this paper the results of Burke and Moré [5] on the identification of active constraints are extended to the nonconvex constrained nonlinear programming problem. The approach is motivated by … In this paper the results of Burke and Moré [5] on the identification of active constraints are extended to the nonconvex constrained nonlinear programming problem. The approach is motivated by the geometric structure of a certain polyhedral convex "linearization" of the constraint region at each iteration. As in Burke and Moré [5] questions of constraint identification are couched in terms of the faces of these polyhedra. The main result employs a nondegeneracy condition due to Dunn [7] and the linear independence condition to obtain a characterization of those algorithms that identify the optimal active constraints in a finite number of iterations. The role of the linear independence condition is carefully examined and it is argued that it is required within the context of a first-order theory of constraint identification. In conclusion, the characterization theorem is applied to the Wilson–Han–Powell sequential quadratic programming algorithm [J. Optim. Theory Appl., 22 (1977), pp. 297–309], [Proceedings of the 1977 Dundee Conference on Numerical Analysis, Springer-Verlag, Berlin, 1977], and [A simplicial algorithm for concave programming, Ph.D. thesis, Graduate School of Business Administration, Harvard University, Boston, 1963] and Fletcher's $QL$ algorithm [Practical Methods for Optimization, Vol. 2, John Wiley, New York, 1981], [Math. Programming Study, 17 (1982), pp. 67–76].
Recently Clarke, Stern and Wolenski characterized, in a Hilbert space, the closed subsets <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="upper C"> <mml:semantics> <mml:mi>C</mml:mi> <mml:annotation encoding="application/x-tex">C</mml:annotation> </mml:semantics> </mml:math> </inline-formula> for which the distance … Recently Clarke, Stern and Wolenski characterized, in a Hilbert space, the closed subsets <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="upper C"> <mml:semantics> <mml:mi>C</mml:mi> <mml:annotation encoding="application/x-tex">C</mml:annotation> </mml:semantics> </mml:math> </inline-formula> for which the distance function <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="d Subscript upper C"> <mml:semantics> <mml:msub> <mml:mi>d</mml:mi> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mi>C</mml:mi> </mml:mrow> </mml:msub> <mml:annotation encoding="application/x-tex">d_{C}</mml:annotation> </mml:semantics> </mml:math> </inline-formula> is continuously differentiable everywhere on an open “tube” of uniform thickness around <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="upper C"> <mml:semantics> <mml:mi>C</mml:mi> <mml:annotation encoding="application/x-tex">C</mml:annotation> </mml:semantics> </mml:math> </inline-formula>. Here a corresponding local theory is developed for the property of <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="d Subscript upper C"> <mml:semantics> <mml:msub> <mml:mi>d</mml:mi> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mi>C</mml:mi> </mml:mrow> </mml:msub> <mml:annotation encoding="application/x-tex">d_{C}</mml:annotation> </mml:semantics> </mml:math> </inline-formula> being continuously differentiable outside of <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="upper C"> <mml:semantics> <mml:mi>C</mml:mi> <mml:annotation encoding="application/x-tex">C</mml:annotation> </mml:semantics> </mml:math> </inline-formula> on some neighborhood of a point <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="x element-of upper C"> <mml:semantics> <mml:mrow> <mml:mi>x</mml:mi> <mml:mo>∈</mml:mo> <mml:mi>C</mml:mi> </mml:mrow> <mml:annotation encoding="application/x-tex">x\in C</mml:annotation> </mml:semantics> </mml:math> </inline-formula>. This is shown to be equivalent to the prox-regularity of <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="upper C"> <mml:semantics> <mml:mi>C</mml:mi> <mml:annotation encoding="application/x-tex">C</mml:annotation> </mml:semantics> </mml:math> </inline-formula> at <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="x"> <mml:semantics> <mml:mi>x</mml:mi> <mml:annotation encoding="application/x-tex">x</mml:annotation> </mml:semantics> </mml:math> </inline-formula>, which is a condition on normal vectors that is commonly fulfilled in variational analysis and has the advantage of being verifiable by calculation. Additional characterizations are provided in terms of <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="d Subscript upper C Superscript 2"> <mml:semantics> <mml:msubsup> <mml:mi>d</mml:mi> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mi>C</mml:mi> </mml:mrow> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mn>2</mml:mn> </mml:mrow> </mml:msubsup> <mml:annotation encoding="application/x-tex">d_{C}^{2}</mml:annotation> </mml:semantics> </mml:math> </inline-formula> being locally of class <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="upper C Superscript 1 plus"> <mml:semantics> <mml:msup> <mml:mi>C</mml:mi> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mn>1</mml:mn> <mml:mo>+</mml:mo> </mml:mrow> </mml:msup> <mml:annotation encoding="application/x-tex">C^{1+}</mml:annotation> </mml:semantics> </mml:math> </inline-formula> or such that <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="d Subscript upper C Superscript 2 Baseline plus sigma StartAbsoluteValue dot EndAbsoluteValue squared"> <mml:semantics> <mml:mrow> <mml:msubsup> <mml:mi>d</mml:mi> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mi>C</mml:mi> </mml:mrow> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mn>2</mml:mn> </mml:mrow> </mml:msubsup> <mml:mo>+</mml:mo> <mml:mi>σ</mml:mi> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mo stretchy="false">|</mml:mo> </mml:mrow> <mml:mo>⋅</mml:mo> <mml:msup> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mo stretchy="false">|</mml:mo> </mml:mrow> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mn>2</mml:mn> </mml:mrow> </mml:msup> </mml:mrow> <mml:annotation encoding="application/x-tex">d_{C}^{2}+\sigma |\cdot |^{2}</mml:annotation> </mml:semantics> </mml:math> </inline-formula> is convex around <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="x"> <mml:semantics> <mml:mi>x</mml:mi> <mml:annotation encoding="application/x-tex">x</mml:annotation> </mml:semantics> </mml:math> </inline-formula> for some <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="sigma greater-than 0"> <mml:semantics> <mml:mrow> <mml:mi>σ</mml:mi> <mml:mo>&gt;</mml:mo> <mml:mn>0</mml:mn> </mml:mrow> <mml:annotation encoding="application/x-tex">\sigma &gt;0</mml:annotation> </mml:semantics> </mml:math> </inline-formula>. Prox-regularity of <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="upper C"> <mml:semantics> <mml:mi>C</mml:mi> <mml:annotation encoding="application/x-tex">C</mml:annotation> </mml:semantics> </mml:math> </inline-formula> at <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="x"> <mml:semantics> <mml:mi>x</mml:mi> <mml:annotation encoding="application/x-tex">x</mml:annotation> </mml:semantics> </mml:math> </inline-formula> corresponds further to the normal cone mapping <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="upper N Subscript upper C"> <mml:semantics> <mml:msub> <mml:mi>N</mml:mi> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mi>C</mml:mi> </mml:mrow> </mml:msub> <mml:annotation encoding="application/x-tex">N_{C}</mml:annotation> </mml:semantics> </mml:math> </inline-formula> having a hypomonotone truncation around <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="x"> <mml:semantics> <mml:mi>x</mml:mi> <mml:annotation encoding="application/x-tex">x</mml:annotation> </mml:semantics> </mml:math> </inline-formula>, and leads to a formula for <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="upper P Subscript upper C"> <mml:semantics> <mml:msub> <mml:mi>P</mml:mi> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mi>C</mml:mi> </mml:mrow> </mml:msub> <mml:annotation encoding="application/x-tex">P_{C}</mml:annotation> </mml:semantics> </mml:math> </inline-formula> by way of <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="upper N Subscript upper C"> <mml:semantics> <mml:msub> <mml:mi>N</mml:mi> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mi>C</mml:mi> </mml:mrow> </mml:msub> <mml:annotation encoding="application/x-tex">N_{C}</mml:annotation> </mml:semantics> </mml:math> </inline-formula>. The local theory also yields new insights on the global level of the Clarke-Stern-Wolenski results, and on a property of sets introduced by Shapiro, as well as on the concept of sets with positive reach considered by Federer in the finite dimensional setting.
We consider linear optimization over a nonempty convex semialgebraic feasible region F. Semidefinite programming is an example. If F is compact, then for almost every linear objective there is a … We consider linear optimization over a nonempty convex semialgebraic feasible region F. Semidefinite programming is an example. If F is compact, then for almost every linear objective there is a unique optimal solution, lying on a unique “active” manifold, around which F is “partly smooth,” and the second-order sufficient conditions hold. Perturbing the objective results in smooth variation of the optimal solution. The active manifold consists, locally, of these perturbed optimal solutions; it is independent of the representation of F and is eventually identified by a variety of iterative algorithms such as proximal and projected gradient schemes. These results extend to unbounded sets F.
Nondegeneracy conditions that guarantee that the optimal active constraints are identified in a finite number of iterations are studied. Results of this type have only been established for a few … Nondegeneracy conditions that guarantee that the optimal active constraints are identified in a finite number of iterations are studied. Results of this type have only been established for a few algorithms, and then under restrictive hypothesis. The main result is a characterization of those algorithms that identify the optimal constraints in a finite number of iterations. This result is obtained with a nondegeneracy assumption which is equivalent, in the standard nonlinear programming problem, to the assumption that there is a set of strictly complementary Lagrange multipliers. As an important consequence of the authors' results the way that this characterization applies to gradient projection and sequential quadratic programming algorithms is shown.
A spectral function of a Hermitian matrix X is a function which depends only on the eigenvalues of X, λ 1 (X) ≥ λ 2 (X) ≥ ⋯ ≥ λ … A spectral function of a Hermitian matrix X is a function which depends only on the eigenvalues of X, λ 1 (X) ≥ λ 2 (X) ≥ ⋯ ≥ λ n (X), and hence may be written f(λ 1 (X), λ 2 (X), …, λ n (X)) for some symmetric function f. Such functions appear in a wide variety of matrix optimization problems. We give a simple proof that this spectral function is differentiable at X if and only if the function f is differentiable at the vector λ(X), and we give a concise formula for the derivative. We then apply this formula to deduce an analogous expression for the Clarke generalized gradient of the spectral function. A similar result holds for real symmetric matrices.
Metric regularity is a central concept in variational analysis for the study of solution mappings associated with “generalized equations”, including variational inequalities and parameterized constraint systems. Here it is employed … Metric regularity is a central concept in variational analysis for the study of solution mappings associated with “generalized equations”, including variational inequalities and parameterized constraint systems. Here it is employed to characterize the distance to irregularity or infeasibility with respect to perturbations of the system structure. Generalizations of the Eckart-Young theorem in numerical analysis are obtained in particular.
Many interesting real functions on Euclidean space are differentiable almost everywhere. All Lipschitz functions have this property, but so, for example, does the spectral abscissa of a matrix (a non-Lipschitz … Many interesting real functions on Euclidean space are differentiable almost everywhere. All Lipschitz functions have this property, but so, for example, does the spectral abscissa of a matrix (a non-Lipschitz function). In practice, the gradient is often easy to compute. We investigate to what extent we can approximate the Clarke subdifferential of such a function at some point by calculating the convex hull of some gradients sampled at random nearby points.
We compare two recent variational-analytic approaches to second-order conditions and sensitivity analysis for nonsmooth optimization. We describe a broad setting where computing the generalized Hessian of Mordukhovich is easy. In … We compare two recent variational-analytic approaches to second-order conditions and sensitivity analysis for nonsmooth optimization. We describe a broad setting where computing the generalized Hessian of Mordukhovich is easy. In this setting, the idea of tilt stability introduced by Poliquin and Rockafellar is equivalent to a classical smooth second-order condition.
We establish the following result: If the graph of a lower semicontinuous real-extended-valued function $f:\mathbb{R} ^{n}\rightarrow\mathbb{R}\cup\{+\infty\}$ admits a Whitney stratification (so in particular if f is a semialgebraic function), then … We establish the following result: If the graph of a lower semicontinuous real-extended-valued function $f:\mathbb{R} ^{n}\rightarrow\mathbb{R}\cup\{+\infty\}$ admits a Whitney stratification (so in particular if f is a semialgebraic function), then the norm of the gradient of f at $x\in\mbox{dom\,}f$ relative to the stratum containing x bounds from below all norms of Clarke subgradients of f at x. As a consequence, we obtain a Morse–Sard type of theorem as well as a nonsmooth extension of the Kurdyka–Łojasiewicz inequality for functions definable in an arbitrary o-minimal structure. It is worthwhile pointing out that, even in a smooth setting, this last result generalizes the one given in [K. Kurdyka, Ann. Inst. Fourier (Grenoble), 48 (1998), pp. 769–783] by removing the boundedness assumption on the domain of the function.
At a given point <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="p overbar"> <mml:semantics> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mover> <mml:mi>p</mml:mi> <mml:mo stretchy="false">¯</mml:mo> </mml:mover> </mml:mrow> <mml:annotation encoding="application/x-tex">\bar {p}</mml:annotation> </mml:semantics> </mml:math> </inline-formula>, a convex function <inline-formula content-type="math/mathml"> … At a given point <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="p overbar"> <mml:semantics> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mover> <mml:mi>p</mml:mi> <mml:mo stretchy="false">¯</mml:mo> </mml:mover> </mml:mrow> <mml:annotation encoding="application/x-tex">\bar {p}</mml:annotation> </mml:semantics> </mml:math> </inline-formula>, a convex function <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="f"> <mml:semantics> <mml:mi>f</mml:mi> <mml:annotation encoding="application/x-tex">f</mml:annotation> </mml:semantics> </mml:math> </inline-formula> is differentiable in a certain subspace <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="script upper U"> <mml:semantics> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mi class="MJX-tex-caligraphic" mathvariant="script">U</mml:mi> </mml:mrow> <mml:annotation encoding="application/x-tex">\mathcal {U}</mml:annotation> </mml:semantics> </mml:math> </inline-formula> (the subspace along which <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="partial-differential f left-parenthesis p overbar right-parenthesis"> <mml:semantics> <mml:mrow> <mml:mi mathvariant="normal">∂</mml:mi> <mml:mi>f</mml:mi> <mml:mo stretchy="false">(</mml:mo> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mover> <mml:mi>p</mml:mi> <mml:mo stretchy="false">¯</mml:mo> </mml:mover> </mml:mrow> <mml:mo stretchy="false">)</mml:mo> </mml:mrow> <mml:annotation encoding="application/x-tex">\partial f(\bar {p})</mml:annotation> </mml:semantics> </mml:math> </inline-formula> has 0-breadth). This property opens the way to defining a suitably restricted second derivative of <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="f"> <mml:semantics> <mml:mi>f</mml:mi> <mml:annotation encoding="application/x-tex">f</mml:annotation> </mml:semantics> </mml:math> </inline-formula> at <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="p overbar"> <mml:semantics> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mover> <mml:mi>p</mml:mi> <mml:mo stretchy="false">¯</mml:mo> </mml:mover> </mml:mrow> <mml:annotation encoding="application/x-tex">\bar {p}</mml:annotation> </mml:semantics> </mml:math> </inline-formula>. We do this via an intermediate function, convex on <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="script upper U"> <mml:semantics> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mi class="MJX-tex-caligraphic" mathvariant="script">U</mml:mi> </mml:mrow> <mml:annotation encoding="application/x-tex">\mathcal {U}</mml:annotation> </mml:semantics> </mml:math> </inline-formula>. We call this function the <inline-formula content-type="math/mathml"> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="script upper U"> <mml:semantics> <mml:mrow class="MJX-TeXAtom-ORD"> <mml:mi class="MJX-tex-caligraphic" mathvariant="script">U</mml:mi> </mml:mrow> <mml:annotation encoding="application/x-tex">\mathcal {U}</mml:annotation> </mml:semantics> </mml:math> </inline-formula>-Lagrangian; it coincides with the ordinary Lagrangian in composite cases: exact penalty, semidefinite programming. Also, we use this new theory to design a conceptual pattern for superlinearly convergent minimization algorithms. Finally, we establish a connection with the Moreau-Yosida regularization.
S.Łojasiewicz a démontré que si f est une fonction analytique au voisinage de a, avec f(a)=0, alors ∥ grad f∥≥|f| α , avec α<1. Nous démontrons la généralisation de cette … S.Łojasiewicz a démontré que si f est une fonction analytique au voisinage de a, avec f(a)=0, alors ∥ grad f∥≥|f| α , avec α<1. Nous démontrons la généralisation de cette inégalité valable dans toute structure o-minimale. Nous en déduisons (comme dans le cas analytique) que toutes les trajectoires du gradient d’une fonction définissable dans une structure o-minimale ont des longueurs uniformément bornées. Ceci permet de démontrer que le flot du gradient définit une rétraction sur une ligne de niveau.
Certain interesting classes of functions on a real inner product space are invariant under an associated group of orthogonal linear transformations. This invariance can be made explicit via a simple … Certain interesting classes of functions on a real inner product space are invariant under an associated group of orthogonal linear transformations. This invariance can be made explicit via a simple decomposition. For example, rotationally invariant functions on ${\bf R}^2 $ are just even functions of the Euclidean norm, and functions on the Hermitian matrices (with trace inner product) which are invariant under unitary similarity transformations are just symmetric functions of the eigenvalues. We develop a framework for answering geometric and analytic (both classical and nonsmooth) questions about such a function by answering the corresponding question for the (much simpler) function appearing in the decomposition. The aim is to understand and extend the foundations of eigenvalue optimization, matrix approximation, and semidefinite programming.
This paper considers the minimization of a convex integral functional over the positive cone of an $L_p $ space, subject to a finite number of linear equality constraints. Such problems … This paper considers the minimization of a convex integral functional over the positive cone of an $L_p $ space, subject to a finite number of linear equality constraints. Such problems arise in spectral estimation, where the bjective function is often entropy-like, and in constrained approximation. The Lagrangian dual problem is finite-dimensional and unconstrained. Under a quasi-interior constraint qualification, the primal and dual values are equal, with dual attainment. Examples show the primal value may not be attained. Conditions are given that ensure that the primal optimal solution can be calculated directly from a dual optimum. These conditions are satisfied in many examples.
Given a real‐analytic function $f:\mathbb{R}^{n} \rightarrow \mathbb{R}$ and a critical point $a \in \mathbb{R}^{n}$, the Łojasiewicz inequality asserts that there exists $\theta\in\lbrack\frac{1}{2},1)$ such that the function $|f-f(a)|^{\theta}\,\Vert\nabla f\Vert^{-1}$ remains bounded … Given a real‐analytic function $f:\mathbb{R}^{n} \rightarrow \mathbb{R}$ and a critical point $a \in \mathbb{R}^{n}$, the Łojasiewicz inequality asserts that there exists $\theta\in\lbrack\frac{1}{2},1)$ such that the function $|f-f(a)|^{\theta}\,\Vert\nabla f\Vert^{-1}$ remains bounded around a. In this paper, we extend the above result to a wide class of nonsmooth functions (that possibly admit the value $+\infty$), by establishing an analogous inequality in which the derivative $\nabla f(x)$ can be replaced by any element $x^{\ast}$ of the subdifferential $\partial f(x)$ of f. Like its smooth version, this result provides new insights into the convergence aspects of subgradient‐type dynamical systems. Provided that the function f is sufficiently regular (for instance, convex or lower‐$C^{2}$), the bounded trajectories of the corresponding subgradient dynamical system can be shown to be of finite length. Explicit estimates of the rate of convergence are also derived.
The $\epsilon$-pseudospectrum of a matrix A is the subset of the complex plane consisting of all eigenvalues of all complex matrices within a distance $\epsilon$ of A. We are interested … The $\epsilon$-pseudospectrum of a matrix A is the subset of the complex plane consisting of all eigenvalues of all complex matrices within a distance $\epsilon$ of A. We are interested in two aspects of "optimization and pseudospectra." The first concerns maximizing the function "real part" over an $\epsilon$-pseudospectrum of a fixed matrix: this defines a function known as the $\epsilon$-pseudospectral abscissa of a matrix. We present a bisection algorithm to compute this function. Our second interest is in minimizing the $\epsilon$-pseudospectral abscissa over a set of feasible matrices. A prerequisite for local optimization of this function is an understanding of its variational properties, the study of which is the main focus of the paper. We show that, in a neighborhood of any nonderogatory matrix, the $\epsilon$-pseudospectral abscissa is a nonsmooth but locally Lipschitz and subdifferentially regular function for sufficiently small $\epsilon$; in fact, it can be expressed locally as the maximum of a finite number of smooth functions. Along the way we obtain an eigenvalue perturbation result: near a nonderogatory matrix, the eigenvalues satisfy a Hölder continuity property on matrix space---a property that is well known when only a single perturbation parameter is considered. The pseudospectral abscissa is a powerful modeling tool: not only is it a robust measure of stability, but it also reveals the transient (as opposed to asymptotic) behavior of associated dynamical systems.
Two useful measures of the robust stability of the discrete-time dynamical system xk+1 = Axk are the ϵ-pseudospectral radius and the numerical radius of A. The ϵ-pseudospectral radius of A … Two useful measures of the robust stability of the discrete-time dynamical system xk+1 = Axk are the ϵ-pseudospectral radius and the numerical radius of A. The ϵ-pseudospectral radius of A is the largest of the moduli of the points in the ϵ-pseudospectrum of A, while the numerical radius is the largest of the moduli of the points in the field of values. We present globally convergent algorithms for computing the ϵ-pseudospectral radius and the numerical radius. For the former algorithm, we discuss conditions under which it is quadratically convergent and provide a detailed accuracy analysis giving conditions under which the algorithm is backward stable. The algorithms are inspired by methods of Byers, Boyd–Balakrishnan, He–Watson and Burke–Lewis–Overton for related problems and depend on computing eigenvalues of symplectic pencils and Hamiltonian matrices.
This paper examines numerical functionals defined on function spaces by means of integrals having certain convexity properties.The functionals are themselves convex, so they can be analysed in the light of … This paper examines numerical functionals defined on function spaces by means of integrals having certain convexity properties.The functionals are themselves convex, so they can be analysed in the light of the theory of conjugate convex functions, which has recently undergone extensive development.The results obtained are applicable to Orlicz space theory and in the study of various extremum problems in control theory and the calculus of variations.
For the problem of minimizing a lower semicontinuous proper convex function f on a Hilbert space, the proximal point algorithm in exact form generates a sequence $\{ z^k \} $ … For the problem of minimizing a lower semicontinuous proper convex function f on a Hilbert space, the proximal point algorithm in exact form generates a sequence $\{ z^k \} $ by taking $z^{k + 1} $ to be the minimizes of $f(z) + ({1 / {2c_k }})\| {z - z^k } \|^2 $, where $c_k > 0$. This algorithm is of interest for several reasons, but especially because of its role in certain computational methods based on duality, such as the Hestenes-Powell method of multipliers in nonlinear programming. It is investigated here in a more general form where the requirement for exact minimization at each iteration is weakened, and the subdifferential $\partial f$ is replaced by an arbitrary maximal monotone operator T. Convergence is established under several criteria amenable to implementation. The rate of convergence is shown to be “typically” linear with an arbitrarily good modulus if $c_k $ stays large enough, in fact superlinear if $c_k \to \infty $. The case of $T = \partial f$ is treated in extra detail. Application is also made to a related case corresponding to minimax problems.
The theory of metric regularity is an extension of two classical results: the Lyusternik tangent space theorem and the Graves surjection theorem. Developments in non-smooth analysis in the 1980s and … The theory of metric regularity is an extension of two classical results: the Lyusternik tangent space theorem and the Graves surjection theorem. Developments in non-smooth analysis in the 1980s and 1990s paved the way for a number of far-reaching extensions of these results. It was also well understood that the phenomena behind the results are of metric origin, not connected with any linear structure. At the same time it became clear that some basic hypotheses of the subdifferential calculus are closely connected with the metric regularity of certain set-valued maps. The survey is devoted to the metric theory of metric regularity and its connection with subdifferential calculus in Banach spaces.