A Limited Memory Algorithm for Bound Constrained Optimization

Type: Article
Publication Date: 1995-09-01
Citations: 5567
DOI: https://doi.org/10.1137/0916069

Abstract

An algorithm for solving large nonlinear optimization problems with simple bounds is described. It is based on the gradient projection method and uses a limited memory BFGS matrix to approximate the Hessian of the objective function. It is shown how to take advantage of the form of the limited memory approximation to implement the algorithm efficiently. The results of numerical tests on a set of large problems are reported.

Locations

  • SIAM Journal on Scientific Computing
  • University of North Texas Digital Library (University of North Texas)
  • OSTI OAI (U.S. Department of Energy Office of Scientific and Technical Information)

Ask a Question About This Paper

Summary

Login to see paper summary

An algorithm for solving large nonlinear optimization problems with simple bounds is described. It is based on the gradient projection method and uses a limited-memory BFGS matrix to approximate the … An algorithm for solving large nonlinear optimization problems with simple bounds is described. It is based on the gradient projection method and uses a limited-memory BFGS matrix to approximate the Hessian of the objective function. We show how to take advantage of the form of the limited-memory approximation to implement the algorithm efficiently. The results of numerical tests on a set of large problems are reported.
A subspace limited BFGS algorithm is proposed for bound constrained optimization.The global convergence will be established under some suitable conditions.Numerical results show that this method is more competitive than the … A subspace limited BFGS algorithm is proposed for bound constrained optimization.The global convergence will be established under some suitable conditions.Numerical results show that this method is more competitive than the normal method.
A description is given of the impact of the identification properties of the gradient projection method on algorithms for the solution of large-scale optimization problems with bound constraints. In particular, … A description is given of the impact of the identification properties of the gradient projection method on algorithms for the solution of large-scale optimization problems with bound constraints. In particular, the author describes results of J.J. More and G. Toraldo (Rep. MCS-P77-0589, Argonne Nat. Lab., 1989), which show that the gradient projection method can be combined with the conjugate gradient method to solve an important class of large-scale (number of variables between 5000 and 15000) bound constrained quadratic programming problems with a few (less than 15) iterations.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">&gt;</ETX>
In this paper we present two new numerical methods for unconstrained large-scale optimization. These methods apply update formulae, which are derived by considering different techniques of approximating the objective function. … In this paper we present two new numerical methods for unconstrained large-scale optimization. These methods apply update formulae, which are derived by considering different techniques of approximating the objective function. Theoretical analysis is given to show the advantages of using these update formulae. It is observed that these update formulae can be employed within the framework of limited memory strategy with only a modest increase in the linear algebra cost. Comparative results with limited memory BFGS (L-BFGS) method are presented.
In this paper, a modified limited memory BFGS method for solving large-scale unconstrained optimization problems is proposed. A remarkable feature of the proposed method is that it possesses a global … In this paper, a modified limited memory BFGS method for solving large-scale unconstrained optimization problems is proposed. A remarkable feature of the proposed method is that it possesses a global convergence property even without convexity assumption on the objective function. The implementations of the algorithm on CUTE test problems are reported, which suggest that a slight improvement has been achieved.
We briefly describe the impact of the identification properties of the gradient projection method on algorithms for the solution of large-scale optimization problems with bound constraints. In particular, we describe … We briefly describe the impact of the identification properties of the gradient projection method on algorithms for the solution of large-scale optimization problems with bound constraints. In particular, we describe recent results of Mor6 and Toraldo which show that the gradient projection method can be combined with the conjugate gradient method to solve an important class of large-scale (number of variables between 5000 and 15000) bound constrained quadratic qrogrmming problems with a few (less than 15) iterations.
In this paper we propose a subspace limited memory quasi-Newton method for solving large-scale optimization with simple bounds on the variables. The limited memory quasi-Newton method is used to update … In this paper we propose a subspace limited memory quasi-Newton method for solving large-scale optimization with simple bounds on the variables. The limited memory quasi-Newton method is used to update the variables with indices outside of the active set, while the projected gradient method is used to update the active variables. The search direction consists of three parts: a subspace quasi-Newton direction, and two subspace gradient and modified gradient directions. Our algorithm can be applied to large-scale problems as there is no need to solve any subproblems. The global convergence of the method is proved and some numerical results are also given.
Abstract Gradient projection methods represent effective tools for solving large-scale constrained optimization problems thanks to their simple implementation and low computational cost per iteration. Despite these good properties, a slow … Abstract Gradient projection methods represent effective tools for solving large-scale constrained optimization problems thanks to their simple implementation and low computational cost per iteration. Despite these good properties, a slow convergence rate can affect gradient projection schemes, especially when high accurate solutions are needed. A strategy to mitigate this drawback consists in properly selecting the values for the steplength along the negative gradient. In this paper, we consider the class of gradient projection methods with line search along the projected arc for box-constrained minimization problems and we analyse different strategies to define the steplength. It is well known in the literature that steplength selection rules able to approximate, at each iteration, the eigenvalues of the inverse of a suitable submatrix of the Hessian of the objective function can improve the performance of gradient projection methods. In this perspective, we propose an automatic hybrid steplength selection technique that employs a proper alternation of standard Barzilai–Borwein rules, when the final active set is not well approximated, and a generalized limited memory strategy based on the Ritz-like values of the Hessian matrix restricted to the inactive constraints, when the final active set is reached. Numerical experiments on quadratic and non-quadratic test problems show the effectiveness of the proposed steplength scheme.
Abstract Recently, Neumaier and Azmi gave a comprehensive convergence theory for a generic algorithm for bound constrained optimization problems with a continuously differentiable objective function. The algorithm combines an active … Abstract Recently, Neumaier and Azmi gave a comprehensive convergence theory for a generic algorithm for bound constrained optimization problems with a continuously differentiable objective function. The algorithm combines an active set strategy with a gradient-free line search along a piecewise linear search path defined by directions chosen to reduce zigzagging. This paper describes , an efficient implementation of this scheme. It employs new limited memory techniques for computing the search directions, improves by adding various safeguards relevant when finite precision arithmetic is used, and adds many practical enhancements in other details. The paper compares and several other solvers on the unconstrained and bound constrained problems from the collection and makes recommendations on which solver to use and when. Depending on the problem class, the problem dimension, and the precise goal, the best solvers are , , and .
Abstract The limited memory steepest descent method (LMSD, Fletcher, 2012) for unconstrained optimization problems stores a few past gradients to compute multiple stepsizes at once. We review this method and … Abstract The limited memory steepest descent method (LMSD, Fletcher, 2012) for unconstrained optimization problems stores a few past gradients to compute multiple stepsizes at once. We review this method and propose new variants. For strictly convex quadratic objective functions, we study the numerical behavior of different techniques to compute new stepsizes. In particular, we introduce a method to improve the use of harmonic Ritz values. We also show the existence of a secant condition associated with LMSD, where the approximating Hessian is projected onto a low-dimensional space. In the general nonlinear case, we propose two new alternatives to Fletcher's method: first, the addition of symmetry constraints to the secant condition valid for the quadratic case; second, a perturbation of the last differences between consecutive gradients, to satisfy multiple secant equations simultaneously. We show that Fletcher's method can also be interpreted from this viewpoint. AMS Classification: 65K05 , 90C20 , 90C30 , 65F15 , 65F10
We study the influence of cell-level mechanical heterogeneity in epithelial tissues using a vertex-based model. Heterogeneity in single cell stiffness is introduced as a quenched random variable in the preferred … We study the influence of cell-level mechanical heterogeneity in epithelial tissues using a vertex-based model. Heterogeneity in single cell stiffness is introduced as a quenched random variable in the preferred shape index($p_0$) for each cell. We uncovered a crossover scaling for the tissue shear modulus, suggesting that tissue collective rigidity is controlled by a single parameter $f_r$, which accounts for the fraction of rigid cells. Interestingly, the rigidity onset occurs at $f_r=0.21$, far below the contact percolation threshold of rigid cells. Due to the separation of rigidity and contact percolations, heterogeneity can enhance tissue rigidity and gives rise to an intermediate solid state. The influence of heterogeneity on tumor invasion dynamics is also investigated. There is an overall impedance of invasion as the tissue becomes more rigid. Invasion can also occur in the intermediate heterogeneous solid state and is characterized by significant spatial-temporal intermittency.
Forecasting elections---a challenging, high-stakes problem---is the subject of much uncertainty, subjectivity, and media scrutiny. To shed light on this process, we develop a method for forecasting elections from the perspective … Forecasting elections---a challenging, high-stakes problem---is the subject of much uncertainty, subjectivity, and media scrutiny. To shed light on this process, we develop a method for forecasting elections from the perspective of dynamical systems. Our model borrows ideas from epidemiology, and we use polling data from United States elections to determine its parameters. Surprisingly, our model performs as well as popular forecasters for the 2012 and 2016 U.S. presidential, senatorial, and gubernatorial races. Although contagion and voting dynamics differ, our work suggests a valuable approach for elucidating how elections are related across states. It also illustrates the effect of accounting for uncertainty in different ways, provides an example of data-driven forecasting using dynamical systems, and suggests avenues for future research on political elections. We conclude with our forecasts for the senatorial and gubernatorial races on 6 November 2018 (which we posted on 5 November 2018).
Learning a smooth skeleton in a low-dimensional space from noisy data becomes important in computer vision and computational biology. Existing methods assume that the manifold constructed from the data is … Learning a smooth skeleton in a low-dimensional space from noisy data becomes important in computer vision and computational biology. Existing methods assume that the manifold constructed from the data is smooth, but they lack the ability to model skeleton structures from noisy data. To overcome this issue, we propose a novel probabilistic structured learning model to learn the density of latent embedding given high-dimensional data and its neighborhood graph. The embedded points that form a smooth skeleton structure are obtained by maximum a posteriori (MAP) estimation. Our analysis shows that the resulting similarity matrix is sparse and unique, and its associated kernel has eigenvalues that follow a power law distribution, which leads to the embeddings of a smooth skeleton. The model is extended to learn a sparse similarity matrix when the graph structure is unknown. Extensive experiments demonstrate the effectiveness of the proposed methods on various datasets by comparing them with existing methods.
We study regression using functional predictors in situations where these functions contain both phase and amplitude variability. In other words, the functions are misaligned due to errors in time measurements, … We study regression using functional predictors in situations where these functions contain both phase and amplitude variability. In other words, the functions are misaligned due to errors in time measurements, and these errors can significantly degrade both model estimation and prediction performance. The current techniques either ignore the phase variability, or handle it via pre-processing, i.e., use an off-the-shelf technique for functional alignment and phase removal. We develop a functional principal component regression model which has comprehensive approach in handling phase and amplitude variability. The model utilizes a mathematical representation of the data known as the square-root slope function. These functions preserve the $\mathbf{L}^2$ norm under warping and are ideally suited for simultaneous estimation of regression and warping parameters. Using both simulated and real-world data sets, we demonstrate our approach and evaluate its prediction performance relative to current models. In addition, we propose an extension to functional logistic and multinomial logistic regression
An elegant and robust approach for discrete adjoint-based automatic differentiation of the Fortran codes is proposed in this work. The technique aims at bridging the gap within the Fortran programming … An elegant and robust approach for discrete adjoint-based automatic differentiation of the Fortran codes is proposed in this work. The technique aims at bridging the gap within the Fortran programming language that currently lacks certain metaprogramming paradigms that are readily available at compile time in C/C++ codes. More specifically, the present work uses an expression-based tape approach, which is the first-of-its-kind implementation in Fortran programming language, that can significantly reduce the memory footprint while improving the computational efficiency of the adjoint-based automatic differentiation (AD). The proposed expression-template-based approach is incorporated in our in-house AD toolbox, which currently is the only Fortran-based tool in the literature that uses a fixed-point-type operator-overloading adjoint sensitivity analysis. The improved version of the fast automatic differentiation using operator-overloading technique toolbox is then coupled with the in-house unstructured parallel compressible (UNPAC) flow solver for a robust design optimization framework (DOF), called UNPAC-DOF. The efficiency and robustness of the proposed technique and the resulting framework are tested for aerodynamic shape optimization problems applied to airfoil and wing geometries.
In this work, we investigate an inverse problem of recovering multiple orders in a time-fractional diffusion model from the data observed at one single point on the boundary. We prove … In this work, we investigate an inverse problem of recovering multiple orders in a time-fractional diffusion model from the data observed at one single point on the boundary. We prove the unique recovery of the orders together with their weights, which does not require a full knowledge of the domain or medium properties, e.g. diffusion and potential coefficients, initial condition and source in the model. The proof is based on Laplace transform and asymptotic expansion. Furthermore, inspired by the analysis, we propose a numerical procedure for recovering these parameters based on a nonlinear least-squares fitting with either fractional polynomials or rational approximations as the model function, and provide numerical experiments to illustrate the approach for small time <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>t</mml:mi> </mml:math> .
&lt;p style='text-indent:20px;'&gt;In this paper, we study linear transport model by adopting &lt;i&gt;deep learning method&lt;/i&gt;, in particular deep neural network (DNN) approach. While the interest of using DNN to study partial … &lt;p style='text-indent:20px;'&gt;In this paper, we study linear transport model by adopting &lt;i&gt;deep learning method&lt;/i&gt;, in particular deep neural network (DNN) approach. While the interest of using DNN to study partial differential equations is arising, here we adapt it to study kinetic models, in particular the linear transport model. Moreover, theoretical analysis on the convergence of neural network and its approximated solution towards analytic solution is shown. We demonstrate the accuracy and effectiveness of the proposed DNN method in numerical experiments.&lt;/p&gt;
This paper proposes a novel profile likelihood method for estimating the covariance parameters in exploratory factor analysis of high-dimensional Gaussian datasets with fewer observations than number of variables. An implicitly … This paper proposes a novel profile likelihood method for estimating the covariance parameters in exploratory factor analysis of high-dimensional Gaussian datasets with fewer observations than number of variables. An implicitly restarted Lanczos algorithm and a limited-memory quasi-Newton method are implemented to develop a matrix-free framework for likelihood maximization. Simulation results show that our method is substantially faster than the expectation-maximization solution without sacrificing accuracy. Our method is applied to fit factor models on data from suicide attempters, suicide ideators and a control group.
Abstract The proliferation of biobanks and large public clinical data sets enables their integration with a smaller amount of locally gathered data for the purposes of parameter estimation and model … Abstract The proliferation of biobanks and large public clinical data sets enables their integration with a smaller amount of locally gathered data for the purposes of parameter estimation and model prediction. However, public data sets may be subject to context-dependent confounders and the protocols behind their generation are often opaque; naively integrating all external data sets equally can bias estimates and lead to spurious conclusions. Weighted data integration is a potential solution, but current methods still require subjective specifications of weights and can become computationally intractable. Under the assumption that local data are generated from the set of unknown true parameters, we propose a novel weighted integration method based upon using the external data to minimize the local data leave-one-out cross validation (LOOCV) error. We demonstrate how the optimization of LOOCV errors for linear and Cox proportional hazards models can be rewritten as functions of external data set integration weights. Significant reductions in estimation error and prediction error are shown using simulation studies mimicking the heterogeneity of clinical data as well as a real-world example using kidney transplant patients from the Scientific Registry of Transplant Recipients.
This paper is devoted to the resolution of an inverse scattering problem. Such problems are known to be highly nonlinear and ill-posed. In our case, the reconstruction is rendered all … This paper is devoted to the resolution of an inverse scattering problem. Such problems are known to be highly nonlinear and ill-posed. In our case, the reconstruction is rendered all the more difficult by the high contrast: the physical characteristics of the scattering object greatly differ from those of the surrounding medium. To properly counterbalance the ill-posedness of the problem, an original shape model taking into account precise prior information about the geometry of the scattering object is proposed. Moreover, the scatterer shape is described with a reduced number of variables and the parametrization enables one to solve the inverse problem with a gradient-based algorithm. This contributes to reduce the computation cost. Tests conducted on synthetic datasets reveal good performance of the proposed algorithm, contrary to more classical approaches based on a Tikhonov regularization scheme or the use of an edge-preserving penalization function.
We propose a new assumption in outlier detection: Normal data instances are commonly located in the area that there is hardly any fluctuation on data density, while outliers are often … We propose a new assumption in outlier detection: Normal data instances are commonly located in the area that there is hardly any fluctuation on data density, while outliers are often appeared in the area that there is violent fluctuation on data density. And based on this hypothesis, we apply a novel density-based approach to unsupervised outlier detection. This approach, called Quantum Clustering (QC), deals with unlabeled data processing and constructs a potential function to find the centroids of clusters and the outliers. The experiments show that the potential function could clearly find the hidden outliers in data points effectively. Besides, by using QC, we could find more subtle outliers by adjusting the parameter $σ$. Moreover, our approach is also evaluated on two datasets (Air Quality Detection and Darwin Correspondence Project) from two different research areas, and the results show the wide applicability of our method.
Recent developments in numerical methods for solving large differentiable nonlinear optimization problems are reviewed. State-of-the-art algorithms for solving unconstrained, bound-constrained, linearly constrained and non-linearly constrained problems are discussed. As well … Recent developments in numerical methods for solving large differentiable nonlinear optimization problems are reviewed. State-of-the-art algorithms for solving unconstrained, bound-constrained, linearly constrained and non-linearly constrained problems are discussed. As well as important conceptual advances and theoretical aspects, emphasis is also placed on more practical issues, such as software availability.
The alternating direction method of multipliers (ADMM) is applied to constrained quadratic optimization of intensity-modulated radiation therapy (IMRT) planning. The proximal operator for the ADMM algorithm is developed, and the … The alternating direction method of multipliers (ADMM) is applied to constrained quadratic optimization of intensity-modulated radiation therapy (IMRT) planning. The proximal operator for the ADMM algorithm is developed, and the algorithm is applied to different clinical treatment sites (liver, prostate, and head & neck). The results are compared with the solutions obtained by several other commercial and non-commercial optimizers. The ADMM solver is about one to two orders of magnitude faster than the other solvers, moreover, it requires less computer memory.
In Germany, local health departments are responsible for surveillance of the current pandemic situation. One of their major tasks is to monitor infected persons. For instance, the direct contacts of … In Germany, local health departments are responsible for surveillance of the current pandemic situation. One of their major tasks is to monitor infected persons. For instance, the direct contacts of infectious persons at group meetings have to be traced and potentially quarantined. Such quarantine requirements may be revoked, when all contact persons obtain a negative polymerase chain reaction (PCR) test result. However, contact tracing and testing is time-consuming, costly and not always feasible. In this work, we present a statistical model for the probability that no transmission of COVID-19 occurred given an arbitrary number of negative test results among contact persons. Hereby, the time-dependent sensitivity and specificity of the PCR test are taken into account. We employ a parametric Bayesian model which combines an adaptable Beta-Binomial prior and two likelihood components in a novel fashion. This is illustrated for group events in German school classes. The first evaluation on a real-world dataset showed that our approach can support important quarantine decisions with the goal to achieve a better balance between necessary containment of the pandemic and preservation of social and economic life. Future work will focus on further refinement and evaluation of quarantine decisions based on our statistical model.
Abstract In this paper, we consider the problem of reconstructing a space-dependent coefficient in a linear Benjamin–Bona–Mahony (BBM)-type equation from a single measurement of its solution at a given time. … Abstract In this paper, we consider the problem of reconstructing a space-dependent coefficient in a linear Benjamin–Bona–Mahony (BBM)-type equation from a single measurement of its solution at a given time. We analyze the well-posedness of the forward initial-boundary value problem and characterize the inverse problem as a constrained optimization one. Our objective consists on reconstructing the variable coefficient in the BBM equation by minimizing an appropriate regularized Tikhonov-type functional constrained by the BBM equation. The well-posedness of the forward problem is studied and approximated numerically by combining a finite-element strategy for spatial discretization using the Python-FEniCS package, together with a second-order implicit scheme for time stepping. The minimization process of the Tikhonov-regularization adopted is performed by using an iterative L-BFGS-B quasi-Newton algorithm as described for instance by Byrd et al. (1995) and Zhu et al. (1997). Numerical simulations are presented to demonstrate the robustness of the proposed method with noisy data. The local stability and uniqueness of the solution to the constrained optimization problem for a fixed value of the regularization parameter are also proved and illustrated numerically.
Using a two-piece normal distribution for modeling univariate data that exhibits symmetry, and uni/bimodality is notably effective. In this respect, the shape parameter value determines whether unimodality or bimodality is … Using a two-piece normal distribution for modeling univariate data that exhibits symmetry, and uni/bimodality is notably effective. In this respect, the shape parameter value determines whether unimodality or bimodality is present. This paper proposes a flexible uni/bimodal distribution with platykurtic density, which can be used to simulate a variety of data. The concept is based on the transforming of a random variable into a folded distribution. Further, the proposed class includes the normal distribution as a sub-model. In the current study, the maximum likelihood method is considered for deriving the main structural properties and for the estimation of parameters. In addition, simulation experiments are presented to evaluate the behavior of estimators. Finally, fitting and regression applications are presented to illustrate the usefulness of the proposed distribution for data modeling in different real-life scenarios.
The K-means algorithm, widely used in cluster analysis, is a centroid-based clustering method known for its high efficiency and scalability. However, in realistic situations, the operating environment is susceptible to … The K-means algorithm, widely used in cluster analysis, is a centroid-based clustering method known for its high efficiency and scalability. However, in realistic situations, the operating environment is susceptible to contamination issues caused by outliers and distribution departures, which may lead to clustering results from K-means that are distorted or rendered invalid. In this paper, we introduce three other alternative algorithms, including K-weighted-medians, K-weighted-L <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> -medians, and K-weighted-HLs, to address these issues under the consideration of data with weights. The impact of contamination is investigated by examining the estimation effects on optimal cluster centroids. We explore the robustness of the clustering algorithms from the perspective of the breakdown point, and then conduct experiments on simulated and real datasets to evaluate their performance using new numerical metrics. The results demonstrate the effectiveness of the proposed K-weighted-HLs algorithm, surpassing other algorithms in scenarios involving both contamination issues.
We present the R package galamm, whose goal is to provide common ground between structural equation modeling and mixed effect models. It supports estimation of models with an arbitrary number … We present the R package galamm, whose goal is to provide common ground between structural equation modeling and mixed effect models. It supports estimation of models with an arbitrary number of crossed or nested random effects, smoothing splines, mixed response types, factor structures, heteroscedastic residuals, and data missing at random. Implementation using sparse matrix methods and automatic differentiation ensures computational efficiency. We here briefly present the implemented methodology, give an overview of the package and an example demonstrating its use.
Latent classification model is a class of statistical methods for identifying unobserved class membership among the study samples using some observed data. In this study, we proposed a latent classification … Latent classification model is a class of statistical methods for identifying unobserved class membership among the study samples using some observed data. In this study, we proposed a latent classification model that takes a censored longitudinal binary outcome variable and uses its changing pattern over time to predict individuals' latent class membership. Assuming the time-dependent outcome variables follow a continuous-time Markov chain, the proposed method has two primary goals: (1) estimate the distribution of the latent classes and predict individuals' class membership, and (2) estimate the class-specific transition rates and rate ratios. To assess the model's performance, we conducted a simulation study and verified that our algorithm produces accurate model estimates (ie, small bias) with reasonable confidence intervals (ie, achieving approximately 95% coverage probability). Furthermore, we compared our model to four other existing latent class models and demonstrated that our approach yields higher prediction accuracies for latent classes. We applied our proposed method to analyze the COVID-19 data in Houston, Texas, US collected between January first 2021 and December 31st 2021. Early reports on the COVID-19 pandemic showed that the severity of a SARS-CoV-2 infection tends to vary greatly by cases. We found that while demographic characteristics explain some of the differences in individuals' experience with COVID-19, some unaccounted-for latent variables were associated with the disease.
Consider a Bayesian setup in which we observe Consider a Bayesian setup in which we observe
The purpose of this paper is to develop variational integrators for conservative mechanical systems that are symplectic and energy and momentum conserving. To do this, a space–time view of variational … The purpose of this paper is to develop variational integrators for conservative mechanical systems that are symplectic and energy and momentum conserving. To do this, a space–time view of variational integrators is employed and time step adaptation is used to impose the constraint of conservation of energy. Criteria for the solvability of the time steps and some numerical examples are given.
We consider the problem of meta-analyzing two-group studies that report the median of the outcome. Often, these studies are excluded from meta-analysis because there are no well-established statistical methods to … We consider the problem of meta-analyzing two-group studies that report the median of the outcome. Often, these studies are excluded from meta-analysis because there are no well-established statistical methods to pool the difference of medians. To include these studies in meta-analysis, several authors have recently proposed methods to estimate the sample mean and standard deviation from the median, sample size, and several commonly reported measures of spread. Researchers frequently apply these methods to estimate the difference of means and its variance for each primary study and pool the difference of means using inverse variance weighting. In this work, we develop several methods to directly meta-analyze the difference of medians. We conduct a simulation study evaluating the performance of the proposed median-based methods and the competing transformation-based methods. The simulation results show that the median-based methods outperform the transformation-based methods when meta-analyzing studies that report the median of the outcome, especially when the outcome is skewed. Moreover, we illustrate the various methods on a real-life data set.
La simulation numerique modelise des phenomenes toujours plus complexes. De tels problemes, souvent de grande dimension, exigent des codes sophistiques et couteux en temps de calcul. Le recours systematique au … La simulation numerique modelise des phenomenes toujours plus complexes. De tels problemes, souvent de grande dimension, exigent des codes sophistiques et couteux en temps de calcul. Le recours systematique au simulateur devient alors illusoire. L'approche privilegiee consiste a definir un nombre reduit de simulations organisees selon un plan d'experiences numeriques a partir duquel une surface de reponse est ajustee pour approcher le simulateur. Nous considerons ici les plans generes par des simulateurs deterministes en phase exploratoire i.e. lorsqu'il n'y a aucune connaissance a priori. Les plans requierent donc certaines proprietes comme le remplissage de l'espace et la bonne repartition des points en projection. Deux indicateurs quantifiant la qualite intrinseque des plans ont ete developpes. Le point essentiel de ce travail concerne un procede de planification basee sur la simulation d'echantillons selon une loi de probabilite par une methode de Monte Carlo par chaines de Markov.
Abstract We present generalized additive latent and mixed models (GALAMMs) for analysis of clustered data with responses and latent variables depending smoothly on observed variables. A scalable maximum likelihood estimation … Abstract We present generalized additive latent and mixed models (GALAMMs) for analysis of clustered data with responses and latent variables depending smoothly on observed variables. A scalable maximum likelihood estimation algorithm is proposed, utilizing the Laplace approximation, sparse matrix computation, and automatic differentiation. Mixed response types, heteroscedasticity, and crossed random effects are naturally incorporated into the framework. The models developed were motivated by applications in cognitive neuroscience, and two case studies are presented. First, we show how GALAMMs can jointly model the complex lifespan trajectories of episodic memory, working memory, and speed/executive function, measured by the California Verbal Learning Test (CVLT), digit span tests, and Stroop tests, respectively. Next, we study the effect of socioeconomic status on brain structure, using data on education and income together with hippocampal volumes estimated by magnetic resonance imaging. By combining semiparametric estimation with latent variable modeling, GALAMMs allow a more realistic representation of how brain and cognition vary across the lifespan, while simultaneously estimating latent traits from measured items. Simulation experiments suggest that model estimates are accurate even with moderate sample sizes.
With the increasing availability of non-Euclidean data objects, statisticians are faced with the task of developing appropriate statistical methods for their analysis. For regression models in which the predictors lie … With the increasing availability of non-Euclidean data objects, statisticians are faced with the task of developing appropriate statistical methods for their analysis. For regression models in which the predictors lie in Rp and the response variables are situated in a metric space, conditional Fréchet means can be used to define the Fréchet regression function. Global and local Fréchet methods have recently been developed for modeling and estimating this regression function as extensions of multiple and local linear regression, respectively. This paper expands on these methodologies by proposing the Fréchet single index model, in which the Fréchet regression function is assumed to depend only on a scalar projection of the multivariate predictor. Estimation is performed by combining local Fréchet along with M-estimation to estimate both the coefficient vector and the underlying regression function, and these estimators are shown to be consistent. The method is illustrated by simulations for response objects on the surface of the unit sphere and through an analysis of human mortality data in which lifetable data are represented by distributions of age-of-death, viewed as elements of the Wasserstein space of distributions.
The James-Stein estimator is an estimator of the multivariate normal mean and dominates the maximum likelihood estimator (MLE) under squared error loss. The original work inspired great interest in developing … The James-Stein estimator is an estimator of the multivariate normal mean and dominates the maximum likelihood estimator (MLE) under squared error loss. The original work inspired great interest in developing shrinkage estimators for a variety of problems. Nonetheless, research on shrinkage estimation for manifold-valued data is scarce. In this article, we propose shrinkage estimators for the parameters of the Log-Normal distribution defined on the manifold of
Stan is a free and open-source C++ program that performs Bayesian inference or optimization for arbitrary user-specified models and can be called from the command line, R, Python, Matlab, or … Stan is a free and open-source C++ program that performs Bayesian inference or optimization for arbitrary user-specified models and can be called from the command line, R, Python, Matlab, or Julia and has great promise for fitting large and complex statistical models in many areas of application. We discuss Stan from users’ and developers’ perspectives and illustrate with a simple but nontrivial nonlinear regression example.
Understanding what shapes species phenotypes over macroevolutionary timescales from comparative data often requires studying the relationship between phenotypes and putative explanatory factors or testing for differences in phenotypes across species … Understanding what shapes species phenotypes over macroevolutionary timescales from comparative data often requires studying the relationship between phenotypes and putative explanatory factors or testing for differences in phenotypes across species groups. In phyllostomid bats for example, is mandible morphology associated to diet preferences? Performing such analyses depends upon reliable phylogenetic regression techniques and associated tests (e.g., phylogenetic Generalized Least Squares, pGLS, and phylogenetic analyses of variance and covariance, pANOVA, pANCOVA). While these tools are well established for univariate data, their multivariate counterparts are lagging behind. This is particularly true for high-dimensional phenotypic data, such as morphometric data. Here, we implement much-needed likelihood-based multivariate pGLS, pMANOVA, and pMANCOVA, and use a recently developed penalized-likelihood framework to extend their application to the difficult case when the number of traits $p$ approaches or exceeds the number of species $n$. We then focus on the pMANOVA and use intensive simulations to assess the performance of the approach as $p$ increases, under various levels of phylogenetic signal and correlations between the traits, phylogenetic structure in the predictors, and under various types of phenotypic differences across species groups. We show that our approach outperforms available alternatives under all circumstances, with greater power to detect phenotypic differences across species group when they exist, and a lower risk of improperly detecting nonexistent differences. Finally, we provide an empirical illustration of our pMANOVA on a geometric-morphometric data set describing mandible morphology in phyllostomid bats along with data on their diet preferences. Overall our results show significant differences between ecological groups. Our approach, implemented in the R package mvMORPH and illustrated in a tutorial for end-users, provides efficient multivariate phylogenetic regression tools for understanding what shapes phenotypic differences across species. [Generalized least squares; high-dimensional data sets; multivariate phylogenetic comparative methods; penalized likelihood; phenomics; phyllostomid bats; phylogenetic MANOVA; phylogenetic regression.].
The limited-memory BFGS method (L-BFGS) has become the workhorse optimization strategy for many large-scale nonlinear optimization problems. A major difficulty with L-BFGS is that, although the memory size $m$ can … The limited-memory BFGS method (L-BFGS) has become the workhorse optimization strategy for many large-scale nonlinear optimization problems. A major difficulty with L-BFGS is that, although the memory size $m$ can have a significant effect on performance, it is difficult to know what memory size will work best. Importantly, a larger $m$ does not necessarily improve performance, but may, in fact, degrade it. There is no guidance in the literature on how to choose $m$. In this paper, we briefly review L-BFGS and then suggest two computationally efficient ways to measure the effectiveness of various memory sizes, thus allowing us to adaptively choose a different memory size at each iteration. Our overall numerical results illustrate that our approach improves the performance of the L-BFGS method. These results also indicate some further directions for research.
Nondegeneracy conditions that guarantee that the optimal active constraints are identified in a finite number of iterations are studied. Results of this type have only been established for a few … Nondegeneracy conditions that guarantee that the optimal active constraints are identified in a finite number of iterations are studied. Results of this type have only been established for a few algorithms, and then under restrictive hypothesis. The main result is a characterization of those algorithms that identify the optimal constraints in a finite number of iterations. This result is obtained with a nondegeneracy assumption which is equivalent, in the standard nonlinear programming problem, to the assumption that there is a set of strictly complementary Lagrange multipliers. As an important consequence of the authors' results the way that this characterization applies to gradient projection and sequential quadratic programming algorithms is shown.
This note gives a construction for minimizing certain twice-differentiable functions on a closed convex subset C, of a Hubert Space, H.The algorithm assumes one can constructively "project" points onto convex … This note gives a construction for minimizing certain twice-differentiable functions on a closed convex subset C, of a Hubert Space, H.The algorithm assumes one can constructively "project" points onto convex sets.A related algorithm may be found in Cheney-Goldstein [l], where a constructive fixed-point theorem is employed to construct points inducing a minimum distance between two convex sets.In certain instances when such projections are not too difficult to construct, say on spheres, linear varieties, and orthants, the method can be effective.For applications to control theory, for example, see Balakrishnan [2], and Goldstein [3].In what follows P will denote the "projection" operator for the convex set C. This operator, which is well defined and Lipschitzian, assigns to a given point in H its closest point in C (see, e.g., [l]).Take x£ff and y£C.Then [x -y, P(x) -y]^\\P(x) -y\\ 2 .In the nontrivial case this inequality is a consequence of the fact that C is supported by a hyperplane through P(x) with normal x -P(x).Let ƒ be a real-valued function on H and x 0 an arbitrary point of C. Let 5 denote the level set (xGC:/(x) ^f(x 0 )}, and let S be any open set containing the convex hull of S. Let ƒ'(*, •)= [V/(x), •] signify the Fréchet derivative of ƒ at x.A point zin C will be called stationary if P(z-pVf(z)) =z for all p>0; equivalently, when ƒ is convex the linear functional f (z, •) achieves a minimum on C at z.
We describe the results of a series of tests for a class of new methods of trust region type for solving the simple bound constrained minimization problem. The results are … We describe the results of a series of tests for a class of new methods of trust region type for solving the simple bound constrained minimization problem. The results are encouraging and lead us to believe that the methods will prove useful in solving large-scale problems.
We study how to use the BFGS quasi-Newton matrices to precondition minimization methods for problems where the storage is critical. We give an update formula which generates matrices using information … We study how to use the BFGS quasi-Newton matrices to precondition minimization methods for problems where the storage is critical. We give an update formula which generates matrices using information from the last <italic>m</italic> iterations, where <italic>m</italic> is any number supplied by the user. The quasi-Newton matrix is updated at every iteration by dropping the oldest information and replacing it by the newest information. It is shown that the matrices generated have some desirable properties. The resulting algorithms are tested numerically and compared with several well-known methods.
This paper extends the known excellent global convergence properties of trust region algorithms for unconstrained optimization to the case where bounds on the variables are present. Weak conditions on the … This paper extends the known excellent global convergence properties of trust region algorithms for unconstrained optimization to the case where bounds on the variables are present. Weak conditions on the accuracy of the Hessian approximations are considered. It is also shown that, when the strict complementarily condition holds, the proposed algorithms reduce to an unconstrained calculation after finitely many iterations, allowing a fast asymptotic rate of convergence.
We consider the problem $\min \{ f(x)|x \geqq 0\} $, and propose algorithms of the form $x_{k + 1} = [x_k - \alpha _k D_k \nabla f(x_k )]^ + $, … We consider the problem $\min \{ f(x)|x \geqq 0\} $, and propose algorithms of the form $x_{k + 1} = [x_k - \alpha _k D_k \nabla f(x_k )]^ + $, where $[ \cdot ]^ + $ denotes projection on the positive orthant, $\alpha _k $ is a stepsize chosen by an Armijo-like rule and $D_k $ is a positive definite symmetric matrix which is partly diagonal. We show that $D_k $ can be calculated simply on the basis of second derivatives of f so that the resulting Newton-like algorithm has a typically superlinear rate of convergence. With other choices of $D_k $ convergence at a typically linear rate is obtained. The algorithms are almost as simple as their unconstrained counterparts. They are well suited for problems of large dimension such as those arising in optimal control while being competitive with existing methods for low-dimensional problems. The effectiveness of the Newton-like algorithm is demonstrated via computational examples involving as many as 10,000 variables. Extensions to general linearly constrained problems are also provided. These extensions utilize a notion of an active generalized rectangle patterned after the notion of an active manifold used in manifold suboptimization methods. By contrast with these methods, many constraints can be added or subtracted from the binding set at each iteration without the need to solve a quadratic programming problem.