Author Description

Login to generate an author description

Ask a Question About This Mathematician

The state-of-the-art algorithms for solving the trust-region subproblem (TRS) are based on an iterative process, involving solutions of many linear systems, eigenvalue problems, subspace optimization, or line search steps. A … The state-of-the-art algorithms for solving the trust-region subproblem (TRS) are based on an iterative process, involving solutions of many linear systems, eigenvalue problems, subspace optimization, or line search steps. A relatively underappreciated fact, due to Gander, Golub, and von Matt [Linear Algebra Appl., 114 (1989), pp. 815--839], is that TRSs can be solved by one generalized eigenvalue problem, with no outer iterations. In this paper we rediscover this fact and discover its great practicality, which exhibits good performance both in accuracy and efficiency. Moreover, we generalize the approach in various directions, namely by allowing for an ellipsoidal constraint, dealing with the so-called hard case, and obtaining approximate solutions efficiently when high accuracy is unnecessary. We demonstrate that the resulting algorithm is a general-purpose TRS solver, effective both for dense and large-sparse problems, including the so-called hard case. Our algorithm is easy to implement: its essence is a few lines of MATLAB code.
The ν-support vector classification (ν-SVC) algorithm was shown to work well and provide intuitive interpretations, e.g., the parameter ν roughly specifies the fraction of support vectors.Although ν corresponds to a … The ν-support vector classification (ν-SVC) algorithm was shown to work well and provide intuitive interpretations, e.g., the parameter ν roughly specifies the fraction of support vectors.Although ν corresponds to a fraction, it cannot take the entire range between 0 and 1 in its original form.This problem was settled by a non-convex extension of ν-SVC and the extended method was experimentally shown to generalize better than original ν-SVC.However, its good generalization performance and convergence properties of the optimization algorithm have not been studied yet.In this paper, we provide new theoretical insights into these issues and propose a novel ν-SVC algorithm that has guaranteed generalization performance and convergence properties.
We consider solving a nonconvex quadratic minimization problem with two quadratic constraints, one of which being convex. This problem is a generalization of the Celis--Denis--Tapia (CDT) problem and thus we … We consider solving a nonconvex quadratic minimization problem with two quadratic constraints, one of which being convex. This problem is a generalization of the Celis--Denis--Tapia (CDT) problem and thus we refer to it as GCDT (Generalized CDT). The CDT problem has been widely studied, but no polynomial-time algorithm was known until Bienstock's recent work. His algorithm solves the CDT problem in polynomial time with respect to the number of bits in data and $\log\epsilon^{-1}$ by admitting an $\epsilon$ error in the constraints. The algorithm, however, appears to be difficult to implement. In this paper, we present another algorithm for GCDT, which is guaranteed to find a global solution for almost all GCDT instances (and slightly perturbed ones in some exceptionally rare cases), in exact arithmetic (including eigenvalue computation). Our algorithm is based on the approach proposed by Iwata, Nakatsukasa, and Takeda (2015) for computing the signed distance between overlapping ellipsoids. Our algorithm computes all the Lagrange multipliers of GCDT by solving a two-parameter linear eigenvalue problem, obtains the corresponding KKT points, and finds a global solution as the KKT point with the smallest objective value. In practice, in finite precision arithmetic, our algorithm requires $O(n^6\log\log u^{-1})$ computational time, where $n$ is the number of variables and $u$ is the unit roundoff. Although we derive our algorithm under the unrealistic assumption that exact eigenvalues can be computed, numerical experiments illustrate that our algorithm performs well in finite precision arithmetic.
DEMiCs is a software package written in C++ for computing the mixed volume of the Newton polytopes of a general semi-mixed polynomial system through dynamic enumeration of all mixed cells. … DEMiCs is a software package written in C++ for computing the mixed volume of the Newton polytopes of a general semi-mixed polynomial system through dynamic enumeration of all mixed cells. The underlying mixed cells play an essential role for computing all isolated zeros of a polynomial system by polyhedral homotopy continuation method. A notable feature of DEMiCs is in the construction of a dynamic enumeration tree for finding all mixed cells. The dynamic enumeration method, proposed by Mizutani, Takeda and Kojima for fully mixed polynomial systems, is extended to semi-mixed systems and incorporated in the package. Numerical results show that DEMiCs significantly is faster than existing software packages for semi-mixed polynomial systems with many distinct supports. The software package DEMiCs is available at http://www.is.titech.ac.jp/~mizutan8/DEMiCs/.
An interesting combinatorial (enumeration) problem arises in the initial phase of the polyhedral homotopy continuation method for computing all solutions of a polynomial equation system in complex variables. It is … An interesting combinatorial (enumeration) problem arises in the initial phase of the polyhedral homotopy continuation method for computing all solutions of a polynomial equation system in complex variables. It is formulated as a problem of finding all solutions of a specially structured system of linear inequalities with a certain additional combinatorial condition. This paper presents a computational method for the problem fully utilizing the duality theory and the simplex method for linear programs, and report numerical results on a single cpu implementation and a parallel cpu implementation of the method.
We propose a two-class linear classification model by taking into account the Euclidean distance from each data point to the discriminant hyperplane and introducing a risk measure which is known … We propose a two-class linear classification model by taking into account the Euclidean distance from each data point to the discriminant hyperplane and introducing a risk measure which is known as the conditional value-at-risk in financial risk management. It is formulated as a nonconvex programming problem and we present a solution method for obtaining either a globally or a locally optimal solution by examining the special structure of the problem. Also, this model is proved to be equivalent to the ν-support vector classification under some parameter setting, and numerical experiments show that the proposed model has better predictive accuracy in general.
We address the minimization of a smooth objective function under an $\ell_0$-constraint and simple convex constraints. When the problem has no constraints except the $\ell_0$-constraint, some efficient algorithms are available; … We address the minimization of a smooth objective function under an $\ell_0$-constraint and simple convex constraints. When the problem has no constraints except the $\ell_0$-constraint, some efficient algorithms are available; for example, Proximal DC (Difference of Convex functions) Algorithm (PDCA) repeatedly evaluates closed-form solutions of convex subproblems, leading to a stationary point of the $\ell_0$-constrained problem. However, when the problem has additional convex constraints, they become inefficient because it is difficult to obtain closed-form solutions of the associated subproblems. In this paper, we reformulate the problem by employing a new DC representation of the $\ell_0$-constraint, so that PDCA can retain the efficiency by reducing its subproblems to the projection operation onto a convex set. Moreover, inspired by the Nesterov's acceleration technique for proximal methods, we propose the Accelerated PDCA (APDCA), which attains the optimal convergence rate if applied to convex programs, and performs well in numerical experiments.
Dense subgraph discovery is an important primitive in graph mining, which has a wide variety of applications in diverse domains. In the densest subgraph problem, given an undirected graph G … Dense subgraph discovery is an important primitive in graph mining, which has a wide variety of applications in diverse domains. In the densest subgraph problem, given an undirected graph G = (V, E) with an edge-weight vector w = (W <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">e</sub> ) <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">e∈E</sub> , we aim to find a subset of vertices S that maximizes the density, i.e., w(S) / |S|, where w(S) is the sum of the weights of the edges in the subgraph induced by S. Although the densest subgraph problem is one of the most well-studied optimization problems for dense subgraph discovery, there is an implicit strong assumption; it is assumed that the weights of all the edges are known exactly as input. In real-world applications, there are often cases where we have only uncertain information of the edge weights. In this study, we provide a framework for dense subgraph discovery under the uncertainty of edge weights. Specifically, we address such an uncertainty issue using the theory of robust optimization. First, we formulate our fundamental problem, the robust densest subgraph problem, and present a simple algorithm. We then formulate the robust densest subgraph problem with sampling oracle that models dense subgraph discovery using an edge-weight sampling oracle, and present an algorithm with a strong theoretical performance guarantee. Computational experiments using both synthetic graphs and popular real-world graphs demonstrate the effectiveness of our proposed algorithms.
There are two main approaches to binary classification problems: the loss function approach and the uncertainty set approach. The loss function approach is widely used in real-world data analysis. Statistical … There are two main approaches to binary classification problems: the loss function approach and the uncertainty set approach. The loss function approach is widely used in real-world data analysis. Statistical decision theory has been used to elucidate its properties such as statistical consistency. Conditional probabilities can also be estimated by using the minimum solution of the loss function. In the uncertainty set approach, an uncertainty set is defined for each binary label from training samples. The best separating hyperplane between the two uncertainty sets is used as the decision function. Although the uncertainty set approach provides an intuitive understanding of learning algorithms, its statistical properties have not been sufficiently studied. In this paper, we show that the uncertainty set is deeply connected with the convex conjugate of a loss function. On the basis of the conjugate relation, we propose a way of revising the uncertainty set approach so that it will have good statistical properties such as statistical consistency. We also introduce statistical models corresponding to uncertainty sets in order to estimate conditional probabilities. Finally, we present numerical experiments, verifying that the learning with revised uncertainty sets improves the prediction accuracy.
We propose a bilevel optimization strategy for selecting the best hyperparameter value for the nonsmooth $\ell_p$ regularizer with $0<p\le 1$. The concerned bilevel optimization problem has a nonsmooth, possibly nonconvex, … We propose a bilevel optimization strategy for selecting the best hyperparameter value for the nonsmooth $\ell_p$ regularizer with $0<p\le 1$. The concerned bilevel optimization problem has a nonsmooth, possibly nonconvex, $\ell_p$-regularized problem as the lower-level problem. Despite the recent popularity of nonconvex $\ell_p$ regularizer and the usefulness of bilevel optimization for selecting hyperparameters, algorithms for such bilevel problems have not been studied because of the difficulty of $\ell_p$ regularizer. We first show new optimality conditions for such bilevel optimization problems and then propose a smoothing-type algorithm together with convergence analysis. The proposed algorithm is simple and scalable as our numerical comparison to Bayesian optimization and grid search indicates. It is a promising algorithm for nonsmooth nonconvex bilevel optimization problems as the first algorithm with convergence guarantee.
Abstract A new Levenberg–Marquardt (LM) method for solving nonlinear least squares problems with convex constraints is described. Various versions of the LM method have been proposed, their main differences being … Abstract A new Levenberg–Marquardt (LM) method for solving nonlinear least squares problems with convex constraints is described. Various versions of the LM method have been proposed, their main differences being in the choice of a damping parameter. In this paper, we propose a new rule for updating the parameter so as to achieve both global and local convergence even under the presence of a convex constraint set. The key to our results is a new perspective of the LM method from majorization-minimization methods. Specifically, we show that if the damping parameter is set in a specific way, the objective function of the standard subproblem in LM methods becomes an upper bound on the original objective function under certain standard assumptions. Our method solves a sequence of the subproblems approximately using an (accelerated) projected gradient method. It finds an $$\varepsilon$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>ε</mml:mi> </mml:math> -stationary point after $$O(\varepsilon ^{-2})$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>O</mml:mi> <mml:mo>(</mml:mo> <mml:msup> <mml:mi>ε</mml:mi> <mml:mrow> <mml:mo>-</mml:mo> <mml:mn>2</mml:mn> </mml:mrow> </mml:msup> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> computation and achieves local quadratic convergence for zero-residual problems under a local error bound condition. Numerical results on compressed sensing and matrix factorization show that our method converges faster in many cases than existing methods.
Financial risk measures have been used recently in machine learning. For example, ν-support vector machine ν-SVM) minimizes the conditional value at risk (CVaR) of margin distribution. The measure is popular … Financial risk measures have been used recently in machine learning. For example, ν-support vector machine ν-SVM) minimizes the conditional value at risk (CVaR) of margin distribution. The measure is popular in finance because of the subadditivity property, but it is very sensitive to a few outliers in the tail of the distribution. We propose a new classification method, extended robust SVM (ER-SVM), which minimizes an intermediate risk measure between the CVaR and value at risk (VaR) by expecting that the resulting model becomes less sensitive than ν-SVM to outliers. We can regard ER-SVM as an extension of robust SVM, which uses a truncated hinge loss. Numerical experiments imply the ER-SVM's possibility of achieving a better prediction performance with proper parameter setting.
In this paper we consider robust classications and show equivalence between the regularized classications. In general, robust classications are used to create a classier robust to data by taking into … In this paper we consider robust classications and show equivalence between the regularized classications. In general, robust classications are used to create a classier robust to data by taking into account the uncertainty of the data. Our result shows that regularized classications inherit robustness
The software package BBCPOP is a MATLAB implementation of a hierarchy of sparse doubly nonnegative relaxations of a class of polynomial optimization (minimization) problems (POPs) with binary, box, and complementarity … The software package BBCPOP is a MATLAB implementation of a hierarchy of sparse doubly nonnegative relaxations of a class of polynomial optimization (minimization) problems (POPs) with binary, box, and complementarity (BBC) constraints. Given a POP in the class and a relaxation order, BBCPOP constructs a simple conic optimization problem (COP), which serves as a doubly nonnegative relaxation of the POP, and then solves the COP by applying the bisection and projection method. The COP is expressed with a linear objective function and constraints described as a single hyperplane and two cones, which are the Cartesian product of positive semidefinite cones and a polyhedral cone induced from the BBC constraints. BBCPOP aims to compute a tight lower bound for the optimal value of a large-scale POP in the class that is beyond the comfort zone of existing software packages. The robustness, reliability, and efficiency of BBCPOP are demonstrated in comparison to the state-of-the-art software SDP package SDPNAL+ on randomly generated sparse POPs of degree 2 and 3 with up to a few thousands variables, and ones of degree from 5 to 8 with up to a few hundred variables. Numerical results on BBC-constrained POPs that arise from quadratic assignment problems are also reported. The software package BBCPOP is available at https://sites.google.com/site/bbcpop1/.
Computing the signed distance between two ellipsoids is a convex optimization problem when the two ellipsoids have no intersection, but it becomes nonconvex when the ellipsoids overlap. Efficient algorithms for … Computing the signed distance between two ellipsoids is a convex optimization problem when the two ellipsoids have no intersection, but it becomes nonconvex when the ellipsoids overlap. Efficient algorithms for convex optimization problems are thus not guaranteed to find the correct signed distance between overlapping ellipsoids. In this paper, we first show that computing the signed distance is equivalent to minimizing the norm along the boundary of the Minkowski difference. We then derive an algorithm with running time $O(n^6)$, where $n$ is the dimension of the ellipsoids, that obtains a global minimizer on the boundary of the Minkowski difference and hence provides the exact signed distance. The algorithm first finds all the points that satisfy the Karush--Kuhn--Tucker (KKT) conditions, and then identifies a relevant KKT point with the smallest signed distance. The primary difficulty in computing the KKT points is that they are the solutions of two bivariate rational equations, whose poles are not known explicitly. Our key step is to convert the rational equations into polynomial equations, which we do by constructing certain bivariate matrix pencils whose zeros of determinants are the zeros of the rational functions. This reduces the problem to a two-parameter quadratic eigenvalue problem, which can be solved via a single-parameter linear eigenvalue problem of larger (squared) size, for which reliable algorithms are available. Thus we provide the first algorithm for computing the signed distance between overlapping ellipsoids with polynomial complexity.
For binary polynomial optimization problems (POPs) of degree $d$ with $n$ variables, we prove that the $\lceil(n+d-1)/2\rceil$th semidefinite programming (SDP) relaxation in Lasserre's hierarchy of SDP relaxations provides the exact … For binary polynomial optimization problems (POPs) of degree $d$ with $n$ variables, we prove that the $\lceil(n+d-1)/2\rceil$th semidefinite programming (SDP) relaxation in Lasserre's hierarchy of SDP relaxations provides the exact optimal value. If binary POPs involve only even-degree monomials, we show that it can be further reduced to $\lceil(n+d-2)/2\rceil$. This bound on the relaxation order coincides with the conjecture by Laurent in 2003, which was recently proved by Fawzi, Saunderson, and Parrilo, on binary quadratic optimization problems where $d=2$. We also numerically confirm that the bound is tight. More precisely, we present instances of binary POPs that require solving at least the $\lceil(n+d-1)/2\rceil$th SDP relaxation for general binary POPs and the $\lceil(n+d-2)/2\rceil$th SDP relaxation for even-degree binary POPs to obtain the exact optimal values.
We consider optimization problems on Riemannian manifolds with equality and inequality constraints, which we call Riemannian nonlinear optimization (RNLO) problems. Although they have numerous applications, the existing studies on them … We consider optimization problems on Riemannian manifolds with equality and inequality constraints, which we call Riemannian nonlinear optimization (RNLO) problems. Although they have numerous applications, the existing studies on them are limited especially in terms of algorithms. In this paper, we propose Riemannian sequential quadratic optimization (RSQO) that uses a line-search technique with an $\ell_{1}$ penalty function as an extension of the standard SQO algorithm for constrained nonlinear optimization problems in Euclidean spaces to Riemannian manifolds. We prove its global convergence to a Karush--Kuhn--Tucker point of the RNLO problem by means of parallel transport and the exponential mapping. Furthermore, we establish its local quadratic convergence by analyzing the relationship between sequences generated by RSQO and the Riemannian Newton method. Ours is the first algorithm that has both global and local convergence properties for constrained nonlinear optimization on Riemannian manifolds. Empirical results show that RSQO finds solutions more stably and with higher accuracy compared with the existing Riemannian penalty and augmented Lagrangian methods.
This paper discusses computational complexity of conceptual successive convex relaxation methods proposed by Kojima and Tunçel for approximating a convex relaxation of a compact subset F = {x ∈ C … This paper discusses computational complexity of conceptual successive convex relaxation methods proposed by Kojima and Tunçel for approximating a convex relaxation of a compact subset F = {x ∈ C 0 : p(x) ≤ 0 (∀p(·) ∈ 𝒫 F )} of the n-dimensional Euclidean space R n . Here, C 0 denotes a nonempty compact convex subset of R n , and 𝒫 F a set of finitely or infinitely many quadratic functions. We evaluate the number of iterations which the successive convex relaxation methods require to attain a convex relaxation of F with a given accuracy ε, in terms of ε, the diameter of C 0 , the diameter of F, and some other quantities characterizing the Lipschitz continuity, the nonlinearity, and the nonconvexity of the set 𝒫 F of quadratic functions.
Nonconvex variants of support vector machines (SVMs) have been developed for various purposes. For example, robust SVMs attain robustness to outliers by using a nonconvex loss function, while extended [Formula: … Nonconvex variants of support vector machines (SVMs) have been developed for various purposes. For example, robust SVMs attain robustness to outliers by using a nonconvex loss function, while extended [Formula: see text]-SVM (E[Formula: see text]-SVM) extends the range of the hyperparameter by introducing a nonconvex constraint. Here, we consider an extended robust support vector machine (ER-SVM), a robust variant of E[Formula: see text]-SVM. ER-SVM combines two types of nonconvexity from robust SVMs and E[Formula: see text]-SVM. Because of the two nonconvexities, the existing algorithm we proposed needs to be divided into two parts depending on whether the hyperparameter value is in the extended range or not. The algorithm also heuristically solves the nonconvex problem in the extended range. In this letter, we propose a new, efficient algorithm for ER-SVM. The algorithm deals with two types of nonconvexity while never entailing more computations than either E[Formula: see text]-SVM or robust SVM, and it finds a critical point of ER-SVM. Furthermore, we show that ER-SVM includes the existing robust SVMs as special cases. Numerical experiments confirm the effectiveness of integrating the two nonconvexities.
Our work focuses on stochastic gradient methods for optimizing a smooth non-convex loss function with a non-smooth non-convex regularizer. Research on this class of problem is quite limited, and until … Our work focuses on stochastic gradient methods for optimizing a smooth non-convex loss function with a non-smooth non-convex regularizer. Research on this class of problem is quite limited, and until recently no non-asymptotic convergence results have been reported. We present two simple stochastic gradient algorithms, for finite-sum and general stochastic optimization problems, which have superior convergence complexities compared to the current state-of-the-art. We also compare our algorithms' performance in practice for empirical risk minimization.
Positive systems can be used as mathematical models for many practical systems, such as biological systems, communication networks, and interconnected systems. In this letter, we propose proximal alternating linearized minimization … Positive systems can be used as mathematical models for many practical systems, such as biological systems, communication networks, and interconnected systems. In this letter, we propose proximal alternating linearized minimization (PALM) and PALM-like algorithms to determine the nearest discrete-time linear positive system to a given system, with the same order as that of the considered system. Global convergence of the PALM algorithm to a critical point of the considered objective function is ensured by using the Kurdyka-Łojasiewicz and semi-algebraic properties. Numerical experiments are performed to compare the PALM and PALM-like algorithms.
In this letter, we formulate two controllability maximization problems for large-scale networked dynamical systems such as brain networks: The first problem is a sparsity constraint optimization problem with a box … In this letter, we formulate two controllability maximization problems for large-scale networked dynamical systems such as brain networks: The first problem is a sparsity constraint optimization problem with a box constraint. The second problem is a modified problem of the first problem, in which the state transition matrix is Metzler. In other words, the second problem is a realization problem for a positive system. We develop a projected gradient method for solving the problems, and prove global convergence to a stationary point with locally linear convergence rate. The projections onto the constraints of the first and second problems are given explicitly. Numerical experiments using the proposed method provide some results that are difficult to deduce theoretically. In particular, the controllability characteristic is observed to change with increase in the parameter specifying sparsity, and the change rate appears to be dependent on the network structure.
Although application examples of multilevel optimization have already been discussed since the 1990s, the development of solution methods was almost limited to bilevel cases due to the difficulty of the … Although application examples of multilevel optimization have already been discussed since the 1990s, the development of solution methods was almost limited to bilevel cases due to the difficulty of the problem. In recent years, in machine learning, Franceschi et al. have proposed a method for solving bilevel optimization problems by replacing their lower-level problems with the $T$ steepest descent update equations with some prechosen iteration number $T$. In this paper, we have developed a gradient-based algorithm for multilevel optimization with $n$ levels based on their idea and proved that our reformulation asymptotically converges to the original multilevel problem. As far as we know, this is one of the first algorithms with some theoretical guarantee for multilevel optimization. Numerical experiments show that a trilevel hyperparameter learning model considering data poisoning produces more stable prediction results than an existing bilevel hyperparameter learning model in noisy data settings.
Solving large scale optirnization problems requires a huge amount of computational power.The size of optimization problems tliat can be solved on a few CPUs has been limited due to a … Solving large scale optirnization problems requires a huge amount of computational power.The size of optimization problems tliat can be solved on a few CPUs has been limited due to a lack of computational power.Grid ancl cluster cornputing has received much attention as a power'ful and inexpensive way ofsolving Iarge scale optimization problems that ap existing $ingle-unit CPU cannot process.Tlte aim of this paper is to show that grid and eluster computing provides tremendous power to optimization methods.The methods that this paper picks up are a successive convex relaxation method for quadratic optimization problems, a polyhedral homotopy method for polyiiomial systems ef equations and a prlmal-dual interior- pelnt metltod for semidefinlte programs.Their parallel implementatiens on grids and clusters together with numerical results are reported.The paper also mentions a grid poTtal system for optin}ization problems briefly.
A wide variety of machine learning algorithms such as support vector machine (SVM), minimax probability machine (MPM), and Fisher discriminant analysis (FDA), exist for binary classification. The purpose of this … A wide variety of machine learning algorithms such as support vector machine (SVM), minimax probability machine (MPM), and Fisher discriminant analysis (FDA), exist for binary classification. The purpose of this paper is to provide a unified classification model that includes the above models through a robust optimization approach. This unified model has several benefits. One is that the extensions and improvements intended for SVM become applicable to MPM and FDA, and vice versa. Another benefit is to provide theoretical results to above learning methods at once by dealing with the unified model. We give a statistical interpretation of the unified classification model and propose a non-convex optimization algorithm that can be applied to non-convex variants of existing learning methods.
We consider a class of nonconvex nonsmooth optimization problems whose objective is the sum of a smooth function and a finite number of nonnegative proper closed possibly nonsmooth functions (whose … We consider a class of nonconvex nonsmooth optimization problems whose objective is the sum of a smooth function and a finite number of nonnegative proper closed possibly nonsmooth functions (whose proximal mappings are easy to compute), some of which are further composed with linear maps. This kind of problems arises naturally in various applications when different regularizers are introduced for inducing simultaneous structures in the solutions. Solving these problems, however, can be challenging because of the coupled nonsmooth functions: the corresponding proximal mapping can be hard to compute so that standard first-order methods such as the proximal gradient algorithm cannot be applied efficiently. In this paper, we propose a successive difference-of-convex approximation method for solving this kind of problems. In this algorithm, we approximate the nonsmooth functions by their Moreau envelopes in each iteration. Making use of the simple observation that Moreau envelopes of nonnegative proper closed functions are continuous {\em difference-of-convex} functions, we can then approximately minimize the approximation function by first-order methods with suitable majorization techniques. These first-order methods can be implemented efficiently thanks to the fact that the proximal mapping of {\em each} nonsmooth function is easy to compute. Under suitable assumptions, we prove that the sequence generated by our method is bounded and any accumulation point is a stationary point of the objective. We also discuss how our method can be applied to concrete applications such as nonconvex fused regularized optimization problems and simultaneously structured matrix optimization problems, and illustrate the performance numerically for these two specific applications.
This paper is devoted to the study of generalized subdifferentials of spectral functions over Euclidean Jordan algebras. Spectral functions appear often in optimization problems playing the role of "regularizer," ``barrier," … This paper is devoted to the study of generalized subdifferentials of spectral functions over Euclidean Jordan algebras. Spectral functions appear often in optimization problems playing the role of "regularizer," ``barrier," ``penalty function," and many others. We provide formulae for the regular, approximate, and horizon subdifferentials of spectral functions. In addition, under local lower semicontinuity, we also furnish a formula for the Clarke subdifferential, thus extending an earlier result by Baes. As application, we compute the generalized subdifferentials of the function that maps an element to its $k$th largest eigenvalue. Furthermore, in connection with recent approaches for nonsmooth optimization, we present a study of the Kurdyka--Łojasiewicz (KL) property for spectral functions and prove a transfer principle for the KL-exponent. In our proofs, we make extensive use of recent tools such as the commutation principle of Ramírez, Seeger, and Sossa and majorization principles developed by Gowda.
Estimation of common dynamics among several observed signals occurs in signal processing, system theory, and computer algebra problems. In this paper, we propose subspace methods for common linear time-invariant dynamics … Estimation of common dynamics among several observed signals occurs in signal processing, system theory, and computer algebra problems. In this paper, we propose subspace methods for common linear time-invariant dynamics detection and estimation. First, we consider the deterministic problem of detection of common dynamics when the data is exact (noise free). Then, we consider the stochastic estimation problem when the data is corrupted by white Gaussian noise. The subspace identification methods proposed have a system theoretic interpretation of finding the intersection of autonomous linear time-invariant behaviors. They are computationally fast but statistically less accurate than alternative optimization methods. Development of local optimization-based methods for common dynamics estimation is a topic of future work.
In this work we address the Ev-SVM model proposed by Perez-Cruz et al. as an extension of the traditional v support vector classification model (v-SVM). Through an enhancement of the … In this work we address the Ev-SVM model proposed by Perez-Cruz et al. as an extension of the traditional v support vector classification model (v-SVM). Through an enhancement of the range of admissible values for the regularization parameter v, the Ev-SVM has been shown to be able to produce a wider variety of decision functions, giving rise to a better adaptability to the data. However, while a clear and intuitive geometric interpretation can be given for the v-SVM model as a nearest-point problem in reduced convex hulls (RCH-NPP), no previous work has been made in developing such intuition for the Ev-SVM model. In this paper we show how Ev-SVM can be reformulated as a geometrical problem that generalizes RCH-NPP, providing new insights into this model. Under this novel point of view, we propose the RapMinos algorithm, able to solve Ev-SVM more efficiently than the current methods. Furthermore, we show how RapMinos is able to address the Ev-SVM model for any choice of regularization norm lp ≥1 seamlessly, which further extends the SVM model flexibility beyond the usual Ev-SVM models.
We propose a bilevel optimization strategy for selecting the best hyperparameter value for the nonsmooth $\ell_p$ regularizer with $0<p\le 1$. The concerned bilevel optimization problem has a nonsmooth, possibly nonconvex, … We propose a bilevel optimization strategy for selecting the best hyperparameter value for the nonsmooth $\ell_p$ regularizer with $0<p\le 1$. The concerned bilevel optimization problem has a nonsmooth, possibly nonconvex, $\ell_p$-regularized problem as the lower-level problem. Despite the recent popularity of nonconvex $\ell_p$-regularizer and the usefulness of bilevel optimization for selecting hyperparameters, algorithms for such bilevel problems have not been studied because of the difficulty of $\ell_p$-regularizer. Our contribution is the proposal of the first algorithm equipped with a theoretical guarantee for finding the best hyperparameter of $\ell_p$-regularized supervised learning problems. Specifically, we propose a smoothing-type algorithm for the above mentioned bilevel optimization problems and provide a theoretical convergence guarantee for the algorithm. Indeed, since optimality conditions are not known for such bilevel optimization problems so far, new necessary optimality conditions, which are called the SB-KKT conditions, are derived and it is shown that a sequence generated by the proposed algorithm actually accumulates at a point satisfying the SB-KKT conditions under some mild assumptions. The proposed algorithm is simple and scalable as our numerical comparison to Bayesian optimization and grid search indicates.
Variable clustering is important for explanatory analysis. However, only few dedicated methods for variable clustering with the Gaussian graphical model have been proposed. Even more severe, small insignificant partial correlations … Variable clustering is important for explanatory analysis. However, only few dedicated methods for variable clustering with the Gaussian graphical model have been proposed. Even more severe, small insignificant partial correlations due to noise can dramatically change the clustering result when evaluating for example with the Bayesian information criteria (BIC). In this work, we try to address this issue by proposing a Bayesian model that accounts for negligible small, but not necessarily zero, partial correlations. Based on our model, we propose to evaluate a variable clustering result using the marginal likelihood. To address the intractable calculation of the marginal likelihood, we propose two solutions: one based on a variational approximation and another based on MCMC. Experiments on simulated data show that the proposed method is similarly accurate as BIC in the no noise setting, but considerably more accurate when there are noisy partial correlations. Furthermore, on real data the proposed method provides clustering results that are intuitively sensible, which is not always the case when using BIC or its extensions.
We consider Riemannian inequality-constrained optimization problems and propose a Riemannian primal-dual interior point trust region method (RIPTRM) for solving them. We prove its global convergence to an approximate Karush-Kuhn-Tucker point … We consider Riemannian inequality-constrained optimization problems and propose a Riemannian primal-dual interior point trust region method (RIPTRM) for solving them. We prove its global convergence to an approximate Karush-Kuhn-Tucker point and a second-order stationary point. We also establish the local near-quadratic convergence. To the best of our knowledge, this is the first algorithm that incorporates the trust region strategy and has the second-order convergence property for optimization problems on Riemannian manifolds with nonlinear inequality constraints. It is also the first Riemannian interior point method that possesses both global and local convergence properties. We conduct numerical experiments in which we introduce a truncated conjugate gradient method and an eigenvalue-based subsolver for RIPTRM to approximately and exactly solve the trust region subproblems, respectively. Empirical results show that RIPTRMs find solutions with higher accuracy compared to an existing Riemannian interior point method and other algorithms. Additionally, we observe that RIPTRM with the exact search direction shows significantly promising performance in an instance where the Hessian of the Lagrangian has a large negative eigenvalue.
The Douglas-Rachford algorithm is a classic splitting method for finding a zero of the sum of two maximal monotone operators. It has also been applied to settings that involve one … The Douglas-Rachford algorithm is a classic splitting method for finding a zero of the sum of two maximal monotone operators. It has also been applied to settings that involve one weakly and one strongly monotone operator. In this work, we extend the Douglas-Rachford algorithm to address multioperator inclusion problems involving $m$ ($m\geq 2$) weakly and strongly monotone operators, reformulated as a two-operator inclusion in a product space. By selecting appropriate parameters, we establish the convergence of the algorithm to a fixed point, from which solutions can be extracted. Furthermore, we illustrate its applicability to sum-of-$m$-functions minimization problems characterized by weakly convex and strongly convex functions. For general nonconvex problems in finite-dimensional spaces, comprising Lipschitz continuously differentiable functions and a proper closed function, we provide global subsequential convergence guarantees.
Graph drawing is a fundamental task in information visualization, with the Fruchterman$\unicode{x2013}$Reingold (FR) force model being one of the most popular choices. We can interpret this visualization task as a … Graph drawing is a fundamental task in information visualization, with the Fruchterman$\unicode{x2013}$Reingold (FR) force model being one of the most popular choices. We can interpret this visualization task as a continuous optimization problem, which can be solved using the FR algorithm, the original algorithm for this force model, or the L-BFGS algorithm, a quasi-Newton method. However, both algorithms suffer from twist problems and are computationally expensive per iteration, which makes achieving high-quality visualizations for large-scale graphs challenging. In this research, we propose a new initial placement based on the stochastic coordinate descent to accelerate the optimization process. We first reformulate the problem as a discrete optimization problem using a hexagonal lattice and then iteratively update a randomly selected vertex along the coordinate Newton direction. We can use the FR or L-BFGS algorithms to obtain the final placement. We demonstrate the effectiveness of our proposed approach through experiments, highlighting the potential of coordinate descent methods for graph drawing tasks. Additionally, we suggest combining our method with other graph drawing techniques for further improvement. We also discuss the relationship between our proposed method and broader graph-related applications.
In this study, we consider an optimization problem with uncertainty dependent on decision variables, which has recently attracted attention due to its importance in machine learning and pricing applications. In … In this study, we consider an optimization problem with uncertainty dependent on decision variables, which has recently attracted attention due to its importance in machine learning and pricing applications. In this problem, the gradient of the objective function cannot be obtained explicitly because the decision-dependent distribution is unknown. Therefore, several zeroth-order methods have been proposed, which obtain noisy objective values by sampling and update the iterates. Although these existing methods have theoretical convergence for optimization problems with decision-dependent uncertainty, they require strong assumptions about the function and distribution or exhibit large variances in their gradient estimators. To overcome these issues, we propose two zeroth-order methods under mild assumptions. First, we develop a zeroth-order method with a new one-point gradient estimator including a variance reduction parameter. The proposed method updates the decision variables while adjusting the variance reduction parameter. Second, we develop a zeroth-order method with a two-point gradient estimator. There are situations where only one-point estimators can be used, but if both one-point and two-point estimators are available, it is more practical to use the two-point estimator. As theoretical results, we show the convergence of our methods to stationary points and provide the worst-case iteration and sample complexity analysis. Our simulation experiments with real data on a retail service application show that our methods output solutions with lower objective values than the conventional zeroth-order methods.
Abstract In this paper, we propose the approximate Bregman proximal gradient algorithm (ABPG) for solving composite nonconvex optimization problems. ABPG employs a new distance that approximates the Bregman distance, making … Abstract In this paper, we propose the approximate Bregman proximal gradient algorithm (ABPG) for solving composite nonconvex optimization problems. ABPG employs a new distance that approximates the Bregman distance, making the subproblem of ABPG simpler to solve compared to existing Bregman-type algorithms. The subproblem of ABPG is often expressed in a closed form. Similarly to existing Bregman-type algorithms, ABPG does not require the global Lipschitz continuity for the gradient of the smooth part. Instead, assuming the smooth adaptable property, we establish the global subsequential convergence under standard assumptions. Additionally, assuming that the Kurdyka–Łojasiewicz property holds, we prove the global convergence for a special case. Our numerical experiments on the $$\ell _p$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:msub> <mml:mi>ℓ</mml:mi> <mml:mi>p</mml:mi> </mml:msub> </mml:math> regularized least squares problem, the $$\ell _p$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:msub> <mml:mi>ℓ</mml:mi> <mml:mi>p</mml:mi> </mml:msub> </mml:math> loss problem, and the nonnegative linear system show that ABPG outperforms existing algorithms especially when the gradient of the smooth part is not globally Lipschitz or even locally Lipschitz continuous.
Abstract Minimizing the sum of a convex function and a composite function appears in various fields. The generalized Levenberg–Marquardt (LM) method, also known as the prox-linear method, has been developed … Abstract Minimizing the sum of a convex function and a composite function appears in various fields. The generalized Levenberg–Marquardt (LM) method, also known as the prox-linear method, has been developed for such optimization problems. The method iteratively solves strongly convex subproblems with a damping term. This study proposes a new generalized LM method for solving the problem with a smooth composite function. The method enjoys three theoretical guarantees: iteration complexity bound, oracle complexity bound, and local convergence under a Hölderian growth condition. The local convergence results include local quadratic convergence under the quadratic growth condition; this is the first to extend the classical result for least-squares problems to a general smooth composite function. In addition, this is the first LM method with both an oracle complexity bound and local quadratic convergence under standard assumptions. These results are achieved by carefully controlling the damping parameter and solving the subproblems by the accelerated proximal gradient method equipped with a particular termination condition. Experimental results show that the proposed method performs well in practice for several instances, including classification with a neural network and nonnegative matrix factorization.
By using the squared slack variables technique, we show that a general polynomial complementarity problem can be formulated as a system of polynomial equations. Thus, the solution set of such … By using the squared slack variables technique, we show that a general polynomial complementarity problem can be formulated as a system of polynomial equations. Thus, the solution set of such a problem is the image of a real algebraic set under a certain projection. This paper points out that, generically, this polynomial system has finitely many complex zeros. In such a case, we use techniques from symbolic computation to compute a univariate representation of the solution set. Consequently, univariate representations of special solutions, such as least-norm and sparse solutions, are obtained. After that, enumerating solutions boils down to solving problems governed by univariate polynomials. We also provide some experiments on small-scale problems with worst-case scenarios. At the end of the paper, we propose a method for computing approximate solutions to copositive polynomial complementarity problems that may have infinitely many solutions.
We discuss the problem of projecting a point onto an arbitrary hyperbolicity cone from both theoretical and numerical perspectives. While hyperbolicity cones are furnished with a generalization of the notion … We discuss the problem of projecting a point onto an arbitrary hyperbolicity cone from both theoretical and numerical perspectives. While hyperbolicity cones are furnished with a generalization of the notion of eigenvalues, obtaining closed form expressions for the projection operator as in the case of semidefinite matrices is an elusive endeavour. To address that we propose a Frank-Wolfe method to handle this task and, more generally, strongly convex optimization over closed convex cones. One of our innovations is that the Frank-Wolfe method is actually applied to the dual problem and, by doing so, subproblems can be solved in closed-form using minimum eigenvalue functions and conjugate vectors. To test the validity of our proposed approach, we present numerical experiments where we check the performance of alternative approaches including interior point methods and an earlier accelerated gradient method proposed by Renegar. We also show numerical examples where the hyperbolic polynomial has millions of monomials. Finally, we also discuss the problem of projecting onto p-cones which, although not hyperbolicity cones in general, are still amenable to our techniques.
In this work, we address a class of nonconvex nonsmooth optimization problems where the objective function is the sum of two smooth functions (one of which is proximable) and two … In this work, we address a class of nonconvex nonsmooth optimization problems where the objective function is the sum of two smooth functions (one of which is proximable) and two nonsmooth functions (one weakly convex and proximable and the other continuous and weakly concave). We introduce a new splitting algorithm that extends the Davis-Yin splitting (DYS) algorithm to handle such four-term nonconvex nonsmooth problems. We prove that with appropriately chosen step sizes, our algorithm exhibits global subsequential convergence to stationary points with a stationarity measure converging at a rate of $1/k$. When specialized to the setting of the DYS algorithm, our results allow for larger stepsizes compared to existing bounds in the literature. Experimental results demonstrate the practical applicability and effectiveness of our proposed algorithm.
Random projection, a dimensionality reduction technique, has been found useful in recent years for reducing the size of optimization problems. In this paper, we explore the use of sparse sub-gaussian … Random projection, a dimensionality reduction technique, has been found useful in recent years for reducing the size of optimization problems. In this paper, we explore the use of sparse sub-gaussian random projections to approximate semidefinite programming (SDP) problems by reducing the size of matrix variables, thereby solving the original problem with much less computational effort. We provide some theoretical bounds on the quality of the projection in terms of feasibility and optimality that explicitly depend on the sparsity parameter of the projector. We investigate the performance of the approach for semidefinite relaxations appearing in polynomial optimization, with a focus on combinatorial optimization problems. In particular, we apply our method to the semidefinite relaxations of MAXCUT and MAX-2-SAT. We show that for large unweighted graphs, we can obtain a good bound by solving a projection of the semidefinite relaxation of MAXCUT. We also explore how to apply our method to find the stability number of four classes of imperfect graphs by solving a projection of the second level of the Lasserre Hierarchy. Overall, our computational experiments show that semidefinite programming problems appearing as relaxations of combinatorial optimization problems can be approximately solved using random projections as long as the number of constraints is not too large.
We propose the Random Subspace Homogenized Trust Region (RSHTR) method, which efficiently solves high-dimensional non-convex optimization problems by identifying descent directions within randomly selected subspaces. RSHTR provides the strongest theoretical … We propose the Random Subspace Homogenized Trust Region (RSHTR) method, which efficiently solves high-dimensional non-convex optimization problems by identifying descent directions within randomly selected subspaces. RSHTR provides the strongest theoretical guarantees among random subspace algorithms for non-convex optimization, achieving an $\varepsilon$-approximate first-order stationary point in $O(\varepsilon^{-3/2})$ iterations and converging locally at a linear rate. Furthermore, under rank-deficient conditions, RSHTR satisfies $\varepsilon$-approximate second-order necessary condition in $O(\varepsilon^{-3/2})$ iterations and exhibits a local quadratic convergence.
We propose a new first-order method for minimizing nonconvex functions with a Lipschitz continuous gradient and Hessian.The proposed method is an accelerated gradient descent with two restart mechanisms and finds … We propose a new first-order method for minimizing nonconvex functions with a Lipschitz continuous gradient and Hessian.The proposed method is an accelerated gradient descent with two restart mechanisms and finds a solution where the gradient norm is less than ε in O(ε -7/4 ) function and gradient evaluations.Unlike existing first-order methods with similar complexity bounds, our algorithm is parameter-free because it requires no prior knowledge of problem-dependent parameters, e.g., the Lipschitz constants and the target accuracy ε.The main challenge in achieving this advantage is estimating the Lipschitz constant of the Hessian using only first-order information.To this end, we develop a new Hessian-free analysis based on two technical inequalities: a Jensen-type inequality for gradients and an error bound for the trapezoidal rule.Several numerical results illustrate that the proposed method performs comparably to existing algorithms with similar complexity bounds, even without parameter tuning.
First-order optimization methods for nonconvex functions with Lipschitz continuous gradients and Hessian have been studied intensively in both machine learning and optimization. State-of-the-art methods finding an $\varepsilon$-stationary point within $O(\varepsilon^{-{7/4}})$ … First-order optimization methods for nonconvex functions with Lipschitz continuous gradients and Hessian have been studied intensively in both machine learning and optimization. State-of-the-art methods finding an $\varepsilon$-stationary point within $O(\varepsilon^{-{7/4}})$ or $\tilde{O}(\varepsilon^{-{7/4}})$ gradient evaluations are based on Nesterov's accelerated gradient descent (AGD) or Polyak's heavy-ball method. However, these algorithms employ additional mechanisms, such as restart schemes and negative curvature exploitation, which complicate the algorithms' behavior and make it challenging to apply them to more advanced settings (e.g., stochastic optimization). To realize a simpler algorithm, we investigate the heavy-ball differential equation, a continuous-time analogy of the AGD and heavy-ball methods; we prove that the dynamics attains an $\varepsilon$-stationary point within $O(\varepsilon^{-{7/4}})$ time. We also show that a vanilla heavy-ball algorithm, obtained by discretizing the dynamics, achieves the complexity of $O(\varepsilon^{-{7/4}})$ under an additional assumption.
In recent years, various subspace algorithms have been developed to handle large-scale optimization problems. Although existing subspace Newton methods require fewer iterations to converge in practice, the matrix operations and … In recent years, various subspace algorithms have been developed to handle large-scale optimization problems. Although existing subspace Newton methods require fewer iterations to converge in practice, the matrix operations and full gradient computation are bottlenecks when dealing with large-scale problems. %In this study, We propose a subspace quasi-Newton method that is restricted to a deterministic-subspace together with a gradient approximation based on random matrix theory. Our method does not require full gradients, let alone Hessian matrices. Yet, it achieves the same order of the worst-case iteration complexities in average for convex and nonconvex cases, compared to existing subspace methods. In numerical experiments, we confirm the superiority of our algorithm in terms of computation time.
Large language models (LLMs) have shown impressive capabilities across various tasks. However, training LLMs from scratch requires significant computational power and extensive memory capacity. Recent studies have explored low-rank structures … Large language models (LLMs) have shown impressive capabilities across various tasks. However, training LLMs from scratch requires significant computational power and extensive memory capacity. Recent studies have explored low-rank structures on weights for efficient fine-tuning in terms of parameters and memory, either through low-rank adaptation or factorization. While effective for fine-tuning, low-rank structures are generally less suitable for pretraining because they restrict parameters to a low-dimensional subspace. In this work, we propose to parameterize the weights as a sum of low-rank and sparse matrices for pretraining, which we call SLTrain. The low-rank component is learned via matrix factorization, while for the sparse component, we employ a simple strategy of uniformly selecting the sparsity support at random and learning only the non-zero entries with the fixed support. While being simple, the random fixed-support sparse learning strategy significantly enhances pretraining when combined with low-rank learning. Our results show that SLTrain adds minimal extra parameters and memory costs compared to pretraining with low-rank parameterization, yet achieves substantially better performance, which is comparable to full-rank training. Remarkably, when combined with quantization and per-layer updates, SLTrain can reduce memory requirements by up to 73% when pretraining the LLaMA 7B model.
Time-varying optimization problems are prevalent in various engineering fields, and the ability to solve them accurately in real-time is becoming increasingly important. The prediction-correction algorithms used in smooth time-varying optimization … Time-varying optimization problems are prevalent in various engineering fields, and the ability to solve them accurately in real-time is becoming increasingly important. The prediction-correction algorithms used in smooth time-varying optimization can achieve better accuracy than that of the time-varying gradient descent (TVGD) algorithm. However, none of the existing prediction-correction algorithms can be applied to general non-strongly-convex functions, and most of them are not computationally efficient enough to solve large-scale problems. Here, we propose a new prediction-correction algorithm that is applicable to large-scale and general non-convex problems and that is more accurate than TVGD. Furthermore, we present convergence analyses of the TVGD and proposed prediction-correction algorithms for non-strongly-convex functions for the first time. In numerical experiments using synthetic and real datasets, the proposed algorithm is shown to be able to reduce the convergence error as the theoretical analyses suggest and outperform the existing algorithms.
Bilevel optimization has seen an increasing presence in various domains of applications. In this work, we propose a framework for solving bilevel optimization problems where variables of both lower and … Bilevel optimization has seen an increasing presence in various domains of applications. In this work, we propose a framework for solving bilevel optimization problems where variables of both lower and upper level problems are constrained on Riemannian manifolds. We provide several hypergradient estimation strategies on manifolds and study their estimation error. We provide convergence and complexity analysis for the proposed hypergradient descent algorithm on manifolds. We also extend the developments to stochastic bilevel optimization and to the use of general retraction. We showcase the utility of the proposed framework on various applications.
Bilevel programming has recently received a great deal of attention due to its abundant applications in many areas. The optimal value function approach provides a useful reformulation of the bilevel … Bilevel programming has recently received a great deal of attention due to its abundant applications in many areas. The optimal value function approach provides a useful reformulation of the bilevel problem, but its utility is often limited due to the nonsmoothness of the value function even in cases when the associated lower-level function is smooth. In this paper, we present two smoothing strategies for the value function associated with lower-level functions that are not necessarily smooth but are Lipschitz continuous. The first method employs quadratic regularization for partially convex lower-level functions, while the second utilizes entropic regularization for general lower-level objective functions. Meanwhile, the property known as gradient consistency is crucial in ensuring that a designed smoothing algorithm is globally subsequentially convergent to stationary points of the value function reformulation. With this motivation, we prove that the proposed smooth approximations satisfy the gradient consistent property under certain conditions on the lower-level function.
Zeroth-order optimization, which does not use derivative information, is one of the significant research areas in the field of mathematical optimization and machine learning. Although various studies have explored zeroth-order … Zeroth-order optimization, which does not use derivative information, is one of the significant research areas in the field of mathematical optimization and machine learning. Although various studies have explored zeroth-order algorithms, one of the theoretical limitations is that oracle complexity depends on the dimension, i.e., on the number of variables, of the optimization problem. In this paper, to reduce the dependency of the dimension in oracle complexity, we propose a zeroth-order random subspace algorithm by combining the famous gradient-free algorithm from Nesterov and Spokoiny (2017) with random projection. We derive the worst-case oracle complexity of our proposed method in non-smooth and convex settings. While we show that our method has a global convergence rate equivalent to standard results for full-dimensional non-smooth convex algorithms, we prove that ours also has a local convergence rate independent of the original dimension under additional assumptions. In addition to the theoretical results, numerical experiments show that when an objective function has a specific structure, the proposed method can become experimentally more efficient due to random projection.
We consider an identification method for a linear continuous time-invariant autonomous system from noisy state observations. In particular, we focus on the identification to satisfy the asymptotic stability of the … We consider an identification method for a linear continuous time-invariant autonomous system from noisy state observations. In particular, we focus on the identification to satisfy the asymptotic stability of the system with some prior knowledge. To this end, we propose to model this identification problem as a Riemannian nonlinear optimization (RNLO) problem, where the stability is ensured through a certain Riemannian manifold and the prior knowledge is expressed as nonlinear constraints defined on this manifold. To solve this RNLO, we apply the Riemannian sequential quadratic optimization (RSQO) that was proposed by Obara, Okuno, and Takeda (2022) most recently. RSQO performs quite well with theoretical guarantee to find a point satisfying the Karush–Kuhn–Tucker conditions of RNLO. In this article, we demonstrate that the identification problem can be indeed solved by RSQO more effectively than competing algorithms.
Abstract A new Levenberg–Marquardt (LM) method for solving nonlinear least squares problems with convex constraints is described. Various versions of the LM method have been proposed, their main differences being … Abstract A new Levenberg–Marquardt (LM) method for solving nonlinear least squares problems with convex constraints is described. Various versions of the LM method have been proposed, their main differences being in the choice of a damping parameter. In this paper, we propose a new rule for updating the parameter so as to achieve both global and local convergence even under the presence of a convex constraint set. The key to our results is a new perspective of the LM method from majorization-minimization methods. Specifically, we show that if the damping parameter is set in a specific way, the objective function of the standard subproblem in LM methods becomes an upper bound on the original objective function under certain standard assumptions. Our method solves a sequence of the subproblems approximately using an (accelerated) projected gradient method. It finds an $$\varepsilon$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>ε</mml:mi> </mml:math> -stationary point after $$O(\varepsilon ^{-2})$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>O</mml:mi> <mml:mo>(</mml:mo> <mml:msup> <mml:mi>ε</mml:mi> <mml:mrow> <mml:mo>-</mml:mo> <mml:mn>2</mml:mn> </mml:mrow> </mml:msup> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> computation and achieves local quadratic convergence for zero-residual problems under a local error bound condition. Numerical results on compressed sensing and matrix factorization show that our method converges faster in many cases than existing methods.
We propose a new first-order method for minimizing nonconvex functions with Lipschitz continuous gradients and H\"older continuous Hessians. The proposed algorithm is a heavy-ball method equipped with two particular restart … We propose a new first-order method for minimizing nonconvex functions with Lipschitz continuous gradients and H\"older continuous Hessians. The proposed algorithm is a heavy-ball method equipped with two particular restart mechanisms. It finds a solution where the gradient norm is less than $\epsilon$ in $O(H_{\nu}^{\frac{1}{2 + 2 \nu}} \epsilon^{- \frac{4 + 3 \nu}{2 + 2 \nu}})$ function and gradient evaluations, where $\nu \in [0, 1]$ and $H_{\nu}$ are the H\"older exponent and constant, respectively. Our algorithm is $\nu$-independent and thus universal; it automatically achieves the above complexity bound with the optimal $\nu \in [0, 1]$ without knowledge of $H_{\nu}$. In addition, the algorithm does not require other problem-dependent parameters as input, including the gradient's Lipschitz constant or the target accuracy $\epsilon$. Numerical results illustrate that the proposed method is promising.
Price determination is a central research topic of revenue management in marketing. The important aspect in pricing is controlling the stochastic behavior of demand, and the previous studies have tackled … Price determination is a central research topic of revenue management in marketing. The important aspect in pricing is controlling the stochastic behavior of demand, and the previous studies have tackled price optimization problems with uncertainties. However, many of those studies assumed that uncertainties are independent of decision variables (i.e., prices) and did not consider situations where demand uncertainty depends on price. Although some price optimization studies have dealt with decision-dependent uncertainty, they make application-specific assumptions in order to obtain an optimal solution or an approximation solution. To handle a wider range of applications with decision-dependent uncertainty, we propose a general non-convex stochastic optimization formulation. This approach aims to maximize the expectation of a revenue function with respect to a random variable representing demand under a decision-dependent distribution. We derived an unbiased stochastic gradient estimator by using a well-tuned variance reduction parameter and used it for a projected stochastic gradient descent method to find a stationary point of our problem. We conducted synthetic experiments and simulation experiments with real data on a retail service application. The results show that the proposed method outputs solutions with higher total revenues than baselines.
We propose randomized subspace gradient methods for high-dimensional constrained optimization. While there have been similarly purposed studies on unconstrained optimization problems, there have been few on constrained optimization problems due … We propose randomized subspace gradient methods for high-dimensional constrained optimization. While there have been similarly purposed studies on unconstrained optimization problems, there have been few on constrained optimization problems due to the difficulty of handling constraints. Our algorithms project gradient vectors onto a subspace that is a random projection of the subspace spanned by the gradients of active constraints. We determine the worst-case iteration complexity under linear and nonlinear settings and theoretically confirm that our algorithms can take a larger step size than their deterministic version. From the advantages of taking longer step and randomized subspace gradients, we show that our algorithms are especially efficient in view of time complexity when gradients cannot be obtained easily. Numerical experiments show that they tend to find better solutions because of the randomness of their subspace selection. Furthermore, they performs well in cases where gradients could not be obtained directly, and instead, gradients are obtained using directional derivatives.
In this paper, we aim at computing all local minimizers of a polynomial optimization problem under genericity conditions. By using a technique in computer algebra, we provide a univariate representation … In this paper, we aim at computing all local minimizers of a polynomial optimization problem under genericity conditions. By using a technique in computer algebra, we provide a univariate representation for the set of local minimizers. In particular, for an unconstrained problem, the coordinates of all local minimizers can be represented by several univariate polynomial equalities and one univariate polynomial matrix inequality. We also develop the technique for constrained problems having equality constraints. Based on the above technique, we design algorithms to enumerate the local minimizers. At the end of the paper, we provide some experimental examples.
In this paper, we propose the approximate Bregman proximal gradient algorithm (ABPG) for solving composite nonconvex optimization problems. ABPG employs a new distance that approximates the Bregman distance, making the … In this paper, we propose the approximate Bregman proximal gradient algorithm (ABPG) for solving composite nonconvex optimization problems. ABPG employs a new distance that approximates the Bregman distance, making the subproblem of ABPG simpler to solve compared to existing Bregman-type algorithms. The subproblem of ABPG is often expressed in a closed form. Similarly to existing Bregman-type algorithms, ABPG does not require the global Lipschitz continuity for the gradient of the smooth part. Instead, assuming the smooth adaptable property, we establish the global subsequential convergence under standard assumptions. Additionally, assuming that the Kurdyka--{\L}ojasiewicz property holds, we prove the global convergence for a special case. Our numerical experiments on the $\ell_p$ regularized least squares problem, the $\ell_p$ loss problem, and the nonnegative linear system show that ABPG outperforms existing algorithms especially when the gradient of the smooth part is not globally Lipschitz or even local Lipschitz continuous.
Random projection techniques based on the Johnson--Lindenstrauss lemma are used for randomly aggregating the constraints or variables of optimization problems while approximately preserving their optimal values, which leads to smaller-scale … Random projection techniques based on the Johnson--Lindenstrauss lemma are used for randomly aggregating the constraints or variables of optimization problems while approximately preserving their optimal values, which leads to smaller-scale optimization problems. D'Ambrosio et al. [Math. Program., 183 (2020), pp. 619--647] have applied random projection to a quadratic optimization problem so as to decrease the number of decision variables. Although the problem size becomes smaller, the projected problem will also almost surely be nonconvex if the original problem is nonconvex and hence will be hard to solve. In this paper, by focusing on the fact that the level of the nonconvexity of a nonconvex quadratic optimization problem can be alleviated by random projection, we find an approximate global optimal value of the problem by attributing it to a convex problem with smaller size. To the best of our knowledge, our paper is the first to use random projection for convexification of nonconvex optimization problems. We evaluate the approximation error between optimum values of a nonconvex optimization problem and its convexified randomly projected problem.
We consider optimization problems on Riemannian manifolds with equality and inequality constraints, which we call Riemannian nonlinear optimization (RNLO) problems. Although they have numerous applications, the existing studies on them … We consider optimization problems on Riemannian manifolds with equality and inequality constraints, which we call Riemannian nonlinear optimization (RNLO) problems. Although they have numerous applications, the existing studies on them are limited especially in terms of algorithms. In this paper, we propose Riemannian sequential quadratic optimization (RSQO) that uses a line-search technique with an $\ell_{1}$ penalty function as an extension of the standard SQO algorithm for constrained nonlinear optimization problems in Euclidean spaces to Riemannian manifolds. We prove its global convergence to a Karush--Kuhn--Tucker point of the RNLO problem by means of parallel transport and the exponential mapping. Furthermore, we establish its local quadratic convergence by analyzing the relationship between sequences generated by RSQO and the Riemannian Newton method. Ours is the first algorithm that has both global and local convergence properties for constrained nonlinear optimization on Riemannian manifolds. Empirical results show that RSQO finds solutions more stably and with higher accuracy compared with the existing Riemannian penalty and augmented Lagrangian methods.
The Gaussian homotopy (GH) method is a popular approach to finding better stationary points for non-convex optimization problems by gradually reducing a parameter value $t$, which changes the problem to … The Gaussian homotopy (GH) method is a popular approach to finding better stationary points for non-convex optimization problems by gradually reducing a parameter value $t$, which changes the problem to be solved from an almost convex one to the original target one. Existing GH-based methods repeatedly call an iterative optimization solver to find a stationary point every time $t$ is updated, which incurs high computational costs. We propose a novel single loop framework for GH methods (SLGH) that updates the parameter $t$ and the optimization decision variables at the same. Computational complexity analysis is performed on the SLGH algorithm under various situations: either a gradient or gradient-free oracle of a GH function can be obtained for both deterministic and stochastic settings. The convergence rate of SLGH with a tuned hyperparameter becomes consistent with the convergence rate of gradient descent, even though the problem to be solved is gradually changed due to $t$. In numerical experiments, our SLGH algorithms show faster convergence than an existing double loop GH method while outperforming gradient descent-based methods in terms of finding a better solution.
Minimizing the sum of a convex function and a composite function appears in various fields. The generalized Levenberg--Marquardt (LM) method, also known as the prox-linear method, has been developed for … Minimizing the sum of a convex function and a composite function appears in various fields. The generalized Levenberg--Marquardt (LM) method, also known as the prox-linear method, has been developed for such optimization problems. The method iteratively solves strongly convex subproblems with a damping term. This study proposes a new generalized LM method for solving the problem with a smooth composite function. The method enjoys three theoretical guarantees: iteration complexity bound, oracle complexity bound, and local convergence under a H\"olderian growth condition. The local convergence results include local quadratic convergence under the quadratic growth condition; this is the first to extend the classical result for least-squares problems to a general smooth composite function. In addition, this is the first LM method with both an oracle complexity bound and local quadratic convergence under standard assumptions. These results are achieved by carefully controlling the damping parameter and solving the subproblems by the accelerated proximal gradient method equipped with a particular termination condition. Experimental results show that the proposed method performs well in practice for several instances, including classification with a neural network and nonnegative matrix factorization.
Gradient Langevin dynamics and a variety of its variants have attracted increasing attention owing to their convergence towards the global optimal solution, initially in the unconstrained convex framework while recently … Gradient Langevin dynamics and a variety of its variants have attracted increasing attention owing to their convergence towards the global optimal solution, initially in the unconstrained convex framework while recently even in convex constrained non-convex problems. In the present work, we extend those frameworks to non-convex problems on a non-convex feasible region with a global optimization algorithm built upon reflected gradient Langevin dynamics and derive its convergence rates. By effectively making use of its reflection at the boundary in combination with the probabilistic representation for the Poisson equation with the Neumann boundary condition, we present promising convergence rates, particularly faster than the existing one for convex constrained non-convex problems.
Community detection is a fundamental network-analysis primitive with a variety of applications in diverse domains. Although the modularity introduced by Newman and Girvan (2004) has widely been used as a … Community detection is a fundamental network-analysis primitive with a variety of applications in diverse domains. Although the modularity introduced by Newman and Girvan (2004) has widely been used as a quality function for community detection, it has some drawbacks. The modularity density introduced by Li et al. (2008) is known to be an effective alternative to the modularity, which mitigates one of the drawbacks called the resolution limit. A large body of work has been devoted to designing exact and heuristic methods for modularity density maximization, without any computational complexity analysis. In this study, we investigate modularity density maximization from both algorithmic and computational complexity aspects. Specifically, we first accelerate column generation for the modularity density maximization problem. To this end, we point out that the auxiliary problem appearing in column generation can be viewed as a dense subgraph discovery problem. Then we employ a well-known strategy for dense subgraph discovery, called the greedy peeling, for approximately solving the auxiliary problem. Moreover, we reformulate the auxiliary problem to a sequence of $0$--$1$ linear programming problems, enabling us to compute its optimal value more efficiently and to get more diverse columns. Computational experiments using a variety of real-world networks demonstrate the effectiveness of our proposed algorithm. Finally, we show the NP-hardness of a slight variant of the modularity density maximization problem, where the output partition has to have two or more clusters, as well as showing the NP-hardness of the auxiliary problem.
We consider a class of sparsity-inducing optimization problems whose constraint set is regularizer-compatible, in the sense that, the constraint set becomes easy-to-project-onto after a coordinate transformation induced by the sparsity-inducing … We consider a class of sparsity-inducing optimization problems whose constraint set is regularizer-compatible, in the sense that, the constraint set becomes easy-to-project-onto after a coordinate transformation induced by the sparsity-inducing regularizer. Our model is general enough to cover, as special cases, the ordered LASSO model and its variants with some commonly used nonconvex sparsity-inducing regularizers. The presence of both the sparsity-inducing regularizer and the constraint set poses challenges on the design of efficient algorithms. In this paper, by exploiting absolute-value symmetry and other properties in the sparsity-inducing regularizer, we propose a new algorithm, called the Doubly Majorized Algorithm (DMA), for this class of problems. The DMA makes use of projections onto the constraint set after the coordinate transformation in each iteration, and hence can be performed efficiently. Without invoking any commonly used constraint qualification conditions such as those based on horizon subdifferentials, we show that any accumulation point of the sequence generated by DMA is a so-called $\psi_{\rm opt}$-stationary point, a new notion of stationarity we define as inspired by the notion of $L$-stationarity. We also show that any global minimizer of our model has to be a $\psi_{\rm opt}$-stationary point, again without imposing any constraint qualification conditions. Finally, we illustrate numerically the performance of DMA on solving variants of ordered LASSO with nonconvex regularizers.
While there already exist randomized subspace Newton methods that restrict the search direction to a random subspace for a convex function, we propose a randomized subspace regularized Newton method for … While there already exist randomized subspace Newton methods that restrict the search direction to a random subspace for a convex function, we propose a randomized subspace regularized Newton method for a non-convex function {and more generally we investigate thoroughly the local convergence rate of the randomized subspace Newton method}. In our proposed algorithm using a modified Hessian of the function restricted to some random subspace, with high probability, the function value decreases even when the objective function is non-convex. In this paper, we show that our method has global convergence under appropriate assumptions and its convergence rate is the same as that of the full regularized Newton method. %We also prove that Furthermore, we can obtain a local linear convergence rate under some additional assumptions, and prove that this rate is the best we can hope, in general, when using random subspace. We furthermore prove that if the Hessian at the local optimum is rank defficient then superlienar convergence holds.
We extend the Levenberg-Marquardt method on Euclidean spaces to Riemannian manifolds. Although a Riemannian Levenberg-Marquardt (RLM) method was produced by Peeters in 1993, to the best of our knowledge, there … We extend the Levenberg-Marquardt method on Euclidean spaces to Riemannian manifolds. Although a Riemannian Levenberg-Marquardt (RLM) method was produced by Peeters in 1993, to the best of our knowledge, there has been no analysis of theoretical guarantees for global and local convergence properties. As with the Euclidean LM method, how to update a specific parameter known as the damping parameter has significant effects on its performances. We propose a trust-region-like approach for determining the parameter. We evaluate the worst-case iteration complexity to reach an epsilon-stationary point, and also prove that it has desirable local convergence properties under the local error-bound condition. Finally, we demonstrate the efficiency of our proposed algorithm by numerical experiments.
Many social phenomena are triggered by public opinion that is formed in the process of opinion exchange among individuals. To date, from the engineering point of view, a large body … Many social phenomena are triggered by public opinion that is formed in the process of opinion exchange among individuals. To date, from the engineering point of view, a large body of work has been devoted to studying how to manipulate individual opinions so as to guide public opinion towards the desired state. Recently, Abebe et al. (KDD 2018) have initiated the study of the impact of interventions at the level of susceptibility rather than the interventions that directly modify individual opinions themselves. For the model, Chan et al. (The Web Conference 2019) designed a local search algorithm to find an optimal solution in polynomial time. However, it can be seen that the solution obtained by solving the above model might not be implemented in real-world scenarios. In fact, as we do not consider the amount of changes of the susceptibility, it would be too costly to change the susceptibility values for agents based on the solution.
Although application examples of multilevel optimization have already been discussed since the 1990s, the development of solution methods was almost limited to bilevel cases due to the difficulty of the … Although application examples of multilevel optimization have already been discussed since the 1990s, the development of solution methods was almost limited to bilevel cases due to the difficulty of the problem. In recent years, in machine learning, Franceschi et al. have proposed a method for solving bilevel optimization problems by replacing their lower-level problems with the $T$ steepest descent update equations with some prechosen iteration number $T$. In this paper, we have developed a gradient-based algorithm for multilevel optimization with $n$ levels based on their idea and proved that our reformulation asymptotically converges to the original multilevel problem. As far as we know, this is one of the first algorithms with some theoretical guarantee for multilevel optimization. Numerical experiments show that a trilevel hyperparameter learning model considering data poisoning produces more stable prediction results than an existing bilevel hyperparameter learning model in noisy data settings.
Model extraction attacks have become serious issues for service providers using machine learning. We consider an adversarial setting to prevent model extraction under the assumption that attackers will make their … Model extraction attacks have become serious issues for service providers using machine learning. We consider an adversarial setting to prevent model extraction under the assumption that attackers will make their best guess on the service provider's model using query accesses, and propose to build a surrogate model that significantly keeps away the predictions of the attacker's model from those of the true model. We formulate the problem as a non-convex constrained bilevel optimization problem and show that for kernel models, it can be transformed into a non-convex 1-quadratically constrained quadratic program with a polynomial-time algorithm to find the global optimum. Moreover, we give a tractable transformation and an algorithm for more complicated models that are learned by using stochastic gradient descent-based algorithms. Numerical experiments show that the surrogate model performs well compared with existing defense models when the difference between the attacker's and service provider's distributions is large. We also empirically confirm the generalization ability of the surrogate model.
We propose a primal-dual interior-point method (IPM) with convergence to second-order stationary points (SOSPs) of nonlinear semidefinite optimization problems, abbreviated as NSDPs. As far as we know, the current algorithms … We propose a primal-dual interior-point method (IPM) with convergence to second-order stationary points (SOSPs) of nonlinear semidefinite optimization problems, abbreviated as NSDPs. As far as we know, the current algorithms for NSDPs only ensure convergence to first-order stationary points such as Karush-Kuhn-Tucker points, but without a worst-case iteration complexity. The proposed method generates a sequence approximating SOSPs while minimizing a primal-dual merit function for NSDPs by using scaled gradient directions and directions of negative curvature. Under some assumptions, the generated sequence accumulates at an SOSP with a worst-case iteration complexity. This result is also obtained for a primal IPM with a slight modification. Finally, our numerical experiments show the benefits of using directions of negative curvature in the proposed method.