Mathematics Statistics and Probability

Statistical Methods and Inference

Description

This cluster of papers focuses on regularization and variable selection methods, particularly in the context of high-dimensional data analysis. It covers topics such as Lasso, model selection, sparse models, covariance estimation, survival analysis, random forests, and Bayesian methods.

Keywords

Regularization; Variable Selection; Lasso; Model Selection; High-Dimensional Data; Sparse Models; Covariance Estimation; Survival Analysis; Random Forests; Bayesian Methods

The <i>Journal of Electronic Imaging</i> (JEI), copublished bimonthly with the Society for Imaging Science and Technology, publishes peer-reviewed papers that cover research and applications in all areas of electronic imaging … The <i>Journal of Electronic Imaging</i> (JEI), copublished bimonthly with the Society for Imaging Science and Technology, publishes peer-reviewed papers that cover research and applications in all areas of electronic imaging science and technology.
The Lasso estimate for linear regression parameters can be interpreted as a Bayesian posterior mode estimate when the regression parameters have independent Laplace (i.e., double-exponential) priors. Gibbs sampling from this … The Lasso estimate for linear regression parameters can be interpreted as a Bayesian posterior mode estimate when the regression parameters have independent Laplace (i.e., double-exponential) priors. Gibbs sampling from this posterior is possible using an expanded hierarchy with conjugate normal priors for the regression parameters and independent exponential priors on their variances. A connection with the inverse-Gaussian distribution provides tractable full conditional distributions. The Bayesian Lasso provides interval estimates (Bayesian credible intervals) that can guide variable selection. Moreover, the structure of the hierarchical model provides both Bayesian and likelihood methods for selecting the Lasso parameter. Slight modifications lead to Bayesian versions of other Lasso-related estimation methods, including bridge regression and a robust variant.
Abstract Nonparametric regression is a set of techniques for estimating a regression curve without making strong assumptions about the shape of the true regression function. These techniques are therefore useful … Abstract Nonparametric regression is a set of techniques for estimating a regression curve without making strong assumptions about the shape of the true regression function. These techniques are therefore useful for building and checking parametric models, as well as for data description. Kernel and nearest-neighbor regression estimators are local versions of univariate location estimators, and so they can readily be introduced to beginning students and consulting clients who are familiar with such summaries as the sample mean and median.
A NEW MEASURE OF RANK CORRELATION M. G. KENDALL M. G. KENDALL Search for other works by this author on: Oxford Academic Google Scholar Biometrika, Volume 30, Issue 1-2, June … A NEW MEASURE OF RANK CORRELATION M. G. KENDALL M. G. KENDALL Search for other works by this author on: Oxford Academic Google Scholar Biometrika, Volume 30, Issue 1-2, June 1938, Pages 81–93, https://doi.org/10.1093/biomet/30.1-2.81 Published: 01 June 1938
This article presents bootstrap methods for estimation, using simple arguments. Minitab macros for implementing these methods are given. This article presents bootstrap methods for estimation, using simple arguments. Minitab macros for implementing these methods are given.
The lasso is a popular technique for simultaneous estimation and variable selection. Lasso variable selection has been shown to be consistent under certain conditions. In this work we derive a … The lasso is a popular technique for simultaneous estimation and variable selection. Lasso variable selection has been shown to be consistent under certain conditions. In this work we derive a necessary condition for the lasso variable selection to be consistent. Consequently, there exist certain scenarios where the lasso is inconsistent for variable selection. We then propose a new version of the lasso, called the adaptive lasso, where adaptive weights are used for penalizing different coefficients in the ℓ1 penalty. We show that the adaptive lasso enjoys the oracle properties; namely, it performs as well as if the true underlying model were given in advance. Similar to the lasso, the adaptive lasso is shown to be near-minimax optimal. Furthermore, the adaptive lasso can be solved by the same efficient algorithm for solving the lasso. We also discuss the extension of the adaptive lasso in generalized linear models and show that the oracle properties still hold under mild regularity conditions. As a byproduct of our theory, the nonnegative garotte is shown to be consistent for variable selection.
Journal Article Statistical Inference in Instrumental Variables Regression with I(1) Processes Get access Peter C. B. Phillips, Peter C. B. Phillips Cowles Foundation for Research in Economics, Yale University Search … Journal Article Statistical Inference in Instrumental Variables Regression with I(1) Processes Get access Peter C. B. Phillips, Peter C. B. Phillips Cowles Foundation for Research in Economics, Yale University Search for other works by this author on: Oxford Academic Google Scholar Bruce E. Hansen Bruce E. Hansen Cowles Foundation for Research in Economics, Yale University Search for other works by this author on: Oxford Academic Google Scholar The Review of Economic Studies, Volume 57, Issue 1, January 1990, Pages 99–125, https://doi.org/10.2307/2297545 Published: 01 January 1990 Article history Received: 01 April 1988 Accepted: 01 April 1989 Published: 01 January 1990
In many cases an optimum or computationally convenient test of a simple hypothesis $H_0$ against a simple alternative $H_1$ may be given in the following form. Reject $H_0$ if $S_n … In many cases an optimum or computationally convenient test of a simple hypothesis $H_0$ against a simple alternative $H_1$ may be given in the following form. Reject $H_0$ if $S_n = \sum^n_{j=1} X_j \leqq k,$ where $X_1, X_2, \cdots, X_n$ are $n$ independent observations of a chance variable $X$ whose distribution depends on the true hypothesis and where $k$ is some appropriate number. In particular the likelihood ratio test for fixed sample size can be reduced to this form. It is shown that with each test of the above form there is associated an index $\rho$. If $\rho_1$ and $\rho_2$ are the indices corresponding to two alternative tests $e = \log \rho_1/\log \rho_2$ measures the relative efficiency of these tests in the following sense. For large samples, a sample of size $n$ with the first test will give about the same probabilities of error as a sample of size $en$ with the second test. To obtain the above result, use is made of the fact that $P(S_n \leqq na)$ behaves roughly like $m^n$ where $m$ is the minimum value assumed by the moment generating function of $X - a$. It is shown that if $H_0$ and $H_1$ specify probability distributions of $X$ which are very close to each other, one may approximate $\rho$ by assuming that $X$ is normally distributed.
The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data … The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to select a parsimonious set for the efficient prediction of a response variable. Least Angle Regression (LARS), a new model selection algorithm, is a useful and less greedy version of traditional forward selection methods. Three main properties are derived: (1) A simple modification of the LARS algorithm implements the Lasso, an attractive version of ordinary least squares that constrains the sum of the absolute regression coefficients; the LARS modification calculates all possible Lasso estimates for a given problem, using an order of magnitude less computer time than previous methods. (2) A different LARS modification efficiently implements Forward Stagewise linear regression, another promising new model selection method; this connection explains the similar numerical results previously observed for the Lasso and Stagewise, and helps us understand the properties of both methods, which are seen as constrained versions of the simpler LARS algorithm. (3) A simple approximation for the degrees of freedom of a LARS estimate is available, from which we derive a Cp estimate of prediction error; this allows a principled choice among the range of possible LARS estimates. LARS and its variants are computationally efficient: the paper describes a publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates.
Abstract Motivation: Modern data acquisition based on high-throughput technology is often facing the problem of missing data. Algorithms commonly used in the analysis of such large-scale data often depend on … Abstract Motivation: Modern data acquisition based on high-throughput technology is often facing the problem of missing data. Algorithms commonly used in the analysis of such large-scale data often depend on a complete set. Missing value imputation offers a solution to this problem. However, the majority of available imputation methods are restricted to one type of variable only: continuous or categorical. For mixed-type data, the different types are usually handled separately. Therefore, these methods ignore possible relations between variable types. We propose a non-parametric method which can cope with different types of variables simultaneously. Results: We compare several state of the art methods for the imputation of missing values. We propose and evaluate an iterative imputation method (missForest) based on a random forest. By averaging over many unpruned classification or regression trees, random forest intrinsically constitutes a multiple imputation scheme. Using the built-in out-of-bag error estimates of random forest, we are able to estimate the imputation error without the need of a test set. Evaluation is performed on multiple datasets coming from a diverse selection of biological fields with artificially introduced missing values ranging from 10% to 30%. We show that missForest can successfully handle missing values, particularly in datasets including different types of variables. In our comparative study, missForest outperforms other methods of imputation especially in data settings where complex interactions and non-linear relations are suspected. The out-of-bag imputation error estimates of missForest prove to be adequate in all settings. Additionally, missForest exhibits attractive computational efficiency and can cope with high-dimensional data. Availability: The ℝ package missForest is freely available from http://stat.ethz.ch/CRAN/. Contact: [email protected]; [email protected]
Journal Article The Lagrange Multiplier Test and its Applications to Model Specification in Econometrics Get access T. S. Breusch, T. S. Breusch University of Southampton Search for other works by … Journal Article The Lagrange Multiplier Test and its Applications to Model Specification in Econometrics Get access T. S. Breusch, T. S. Breusch University of Southampton Search for other works by this author on: Oxford Academic Google Scholar A. R. Pagan A. R. Pagan Australian National University Search for other works by this author on: Oxford Academic Google Scholar The Review of Economic Studies, Volume 47, Issue 1, 1980, Pages 239–253, https://doi.org/10.2307/2297111 Published: 01 January 1980
Variable selection is fundamental to high-dimensional statistical modeling, including nonparametric regression. Many approaches in use are stepwise selection procedures, which can be computationally expensive and ignore stochastic errors in the … Variable selection is fundamental to high-dimensional statistical modeling, including nonparametric regression. Many approaches in use are stepwise selection procedures, which can be computationally expensive and ignore stochastic errors in the variable selection process. In this article, penalized likelihood approaches are proposed to handle these kinds of problems. The proposed methods select variables and estimate coefficients simultaneously. Hence they enable us to construct confidence intervals for estimated parameters. The proposed approaches are distinguished from others in that the penalty functions are symmetric, nonconcave on (0, ∞), and have singularities at the origin to produce sparse solutions. Furthermore, the penalty functions should be bounded by a constant to reduce bias and satisfy certain conditions to yield continuous solutions. A new algorithm is proposed for optimizing penalized likelihood functions. The proposed ideas are widely applicable. They are readily applied to a variety of parametric models such as generalized linear models and robust regression models. They can also be applied easily to nonparametric modeling by using wavelets and splines. Rates of convergence of the proposed penalized likelihood estimators are established. Furthermore, with proper choice of regularization parameters, we show that the proposed estimators perform as well as the oracle procedure in variable selection; namely, they work as well as if the correct submodel were known. Our simulation shows that the newly proposed methods compare favorably with other variable selection techniques. Furthermore, the standard error formulas are tested to be accurate enough for practical applications.
Journal Article Matching As An Econometric Evaluation Estimator Get access James J. Heckman, James J. Heckman University of Chicago Search for other works by this author on: Oxford Academic Google … Journal Article Matching As An Econometric Evaluation Estimator Get access James J. Heckman, James J. Heckman University of Chicago Search for other works by this author on: Oxford Academic Google Scholar Hidehiko Ichimura, Hidehiko Ichimura University of Pittsburgh Search for other works by this author on: Oxford Academic Google Scholar Petra Todd Petra Todd University of Pennsylvania Search for other works by this author on: Oxford Academic Google Scholar The Review of Economic Studies, Volume 65, Issue 2, April 1998, Pages 261–294, https://doi.org/10.1111/1467-937X.00044 Published: 01 April 1998 Article history Received: 01 September 1994 Accepted: 01 September 1997 Published: 01 April 1998
We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, two-class logistic regression, and multinomial regression problems while the penalties include ℓ(1) … We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, two-class logistic regression, and multinomial regression problems while the penalties include ℓ(1) (the lasso), ℓ(2) (ridge regression) and mixtures of the two (the elastic net). The algorithms use cyclical coordinate descent, computed along a regularization path. The methods can handle large problems and can also deal efficiently with sparse features. In comparative timings we find that the new algorithms are considerably faster than competing methods.
Journal Article Logistic Regression: Why We Cannot Do What We Think We Can Do, and What We Can Do About It Get access Carina Mood Carina Mood Search for other … Journal Article Logistic Regression: Why We Cannot Do What We Think We Can Do, and What We Can Do About It Get access Carina Mood Carina Mood Search for other works by this author on: Oxford Academic Google Scholar European Sociological Review, Volume 26, Issue 1, February 2010, Pages 67–82, https://doi.org/10.1093/esr/jcp006 Published: 09 March 2009
Abstract. Both the root mean square error (RMSE) and the mean absolute error (MAE) are regularly employed in model evaluation studies. Willmott and Matsuura (2005) have suggested that the RMSE … Abstract. Both the root mean square error (RMSE) and the mean absolute error (MAE) are regularly employed in model evaluation studies. Willmott and Matsuura (2005) have suggested that the RMSE is not a good indicator of average model performance and might be a misleading indicator of average error, and thus the MAE would be a better metric for that purpose. While some concerns over using RMSE raised by Willmott and Matsuura (2005) and Willmott et al. (2009) are valid, the proposed avoidance of RMSE in favor of MAE is not the solution. Citing the aforementioned papers, many researchers chose MAE over RMSE to present their model evaluation statistics when presenting or adding the RMSE measures could be more beneficial. In this technical note, we demonstrate that the RMSE is not ambiguous in its meaning, contrary to what was claimed by Willmott et al. (2009). The RMSE is more appropriate to represent model performance than the MAE when the error distribution is expected to be Gaussian. In addition, we show that the RMSE satisfies the triangle inequality requirement for a distance metric, whereas Willmott et al. (2009) indicated that the sums-of-squares-based statistics do not satisfy this rule. In the end, we discussed some circumstances where using the RMSE will be more beneficial. However, we do not contend that the RMSE is superior over the MAE. Instead, a combination of metrics, including but certainly not limited to RMSEs and MAEs, are often required to assess model performance.
Summary Recent work by Reiss and Ogden provides a theoretical basis for sometimes preferring restricted maximum likelihood (REML) to generalized cross-validation (GCV) for smoothing parameter selection in semiparametric regression. However, … Summary Recent work by Reiss and Ogden provides a theoretical basis for sometimes preferring restricted maximum likelihood (REML) to generalized cross-validation (GCV) for smoothing parameter selection in semiparametric regression. However, existing REML or marginal likelihood (ML) based methods for semiparametric generalized linear models (GLMs) use iterative REML or ML estimation of the smoothing parameters of working linear approximations to the GLM. Such indirect schemes need not converge and fail to do so in a non-negligible proportion of practical analyses. By contrast, very reliable prediction error criteria smoothing parameter selection methods are available, based on direct optimization of GCV, or related criteria, for the GLM itself. Since such methods directly optimize properly defined functions of the smoothing parameters, they have much more reliable convergence properties. The paper develops the first such method for REML or ML estimation of smoothing parameters. A Laplace approximation is used to obtain an approximate REML or ML for any GLM, which is suitable for efficient direct optimization. This REML or ML criterion requires that Newton–Raphson iteration, rather than Fisher scoring, be used for GLM fitting, and a computationally stable approach to this is proposed. The REML or ML criterion itself is optimized by a Newton method, with the derivatives required obtained by a mixture of implicit differentiation and direct methods. The method will cope with numerical rank deficiency in the fitted model and in fact provides a slight improvement in numerical robustness on the earlier method of Wood for prediction error criteria based smoothness selection. Simulation results suggest that the new REML and ML methods offer some improvement in mean-square error performance relative to GCV or Akaike’s information criterion in most cases, without the small number of severe undersmoothing failures to which Akaike’s information criterion and GCV are prone. This is achieved at the same computational cost as GCV or Akaike’s information criterion. The new approach also eliminates the convergence failures of previous REML- or ML-based approaches for penalized GLMs and usually has lower computational cost than these alternatives. Example applications are presented in adaptive smoothing, scalar on function regression and generalized additive model selection.
Summary In the paper I give a brief review of the basic idea and some history and then discuss some developments since the original paper on regression shrinkage and selection … Summary In the paper I give a brief review of the basic idea and some history and then discuss some developments since the original paper on regression shrinkage and selection via the lasso.
The nearest neighbor decision rule assigns to an unclassified sample point the classification of the nearest of a set of previously classified points. This rule is independent of the underlying … The nearest neighbor decision rule assigns to an unclassified sample point the classification of the nearest of a set of previously classified points. This rule is independent of the underlying joint distribution on the sample points and their classifications, and hence the probability of error <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">R</tex> of such a rule must be at least as great as the Bayes probability of error <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">R^{\ast}</tex> --the minimum probability of error over all decision rules taking underlying probability structure into account. However, in a large sample analysis, we will show in the <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">M</tex> -category case that <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">R^{\ast} \leq R \leq R^{\ast}(2 --MR^{\ast}/(M-1))</tex> , where these bounds are the tightest possible, for all suitably smooth underlying distributions. Thus for any number of categories, the probability of error of the nearest neighbor rule is bounded above by twice the Bayes probability of error. In this sense, it may be said that half the classification information in an infinite sample set is contained in the nearest neighbor.
Permutation methods can provide exact control of false positives and allow the use of non-standard statistics, making only weak assumptions about the data. With the availability of fast and inexpensive … Permutation methods can provide exact control of false positives and allow the use of non-standard statistics, making only weak assumptions about the data. With the availability of fast and inexpensive computing, their main limitation would be some lack of flexibility to work with arbitrary experimental designs. In this paper we report on results on approximate permutation methods that are more flexible with respect to the experimental design and nuisance variables, and conduct detailed simulations to identify the best method for settings that are typical for imaging research scenarios. We present a generic framework for permutation inference for complex general linear models (glms) when the errors are exchangeable and/or have a symmetric distribution, and show that, even in the presence of nuisance effects, these permutation inferences are powerful while providing excellent control of false positives in a wide range of common and relevant imaging research scenarios. We also demonstrate how the inference on glm parameters, originally intended for independent data, can be used in certain special but useful cases in which independence is violated. Detailed examples of common neuroimaging applications are provided, as well as a complete algorithm – the "randomise" algorithm – for permutation inference with the glm.
Abstract We consider the problem of setting approximate confidence intervals for a single parameter θ in a multiparameter family. The standard approximate intervals based on maximum likelihood theory, , can … Abstract We consider the problem of setting approximate confidence intervals for a single parameter θ in a multiparameter family. The standard approximate intervals based on maximum likelihood theory, , can be quite misleading. In practice, tricks based on transformations, bias corrections, and so forth, are often used to improve their accuracy. The bootstrap confidence intervals discussed in this article automatically incorporate such tricks without requiring the statistician to think them through for each new application, at the price of a considerable increase in computational effort. The new intervals incorporate an improvement over previously suggested methods, which results in second-order correctness in a wide variety of problems. In addition to parametric families, bootstrap intervals are also developed for nonparametric situations.
The rule of thumb that logistic and Cox models should be used with a minimum of 10 outcome events per predictor variable (EPV), based on two simulation studies, may be … The rule of thumb that logistic and Cox models should be used with a minimum of 10 outcome events per predictor variable (EPV), based on two simulation studies, may be too conservative. The authors conducted a large simulation study of other influences on confidence interval coverage, type I error, relative bias, and other model performance measures. They found a range of circumstances in which coverage and bias were within acceptable levels despite less than 10 EPV, as well as other factors that were as influential as or more influential than EPV. They conclude that this rule can be relaxed, in particular for sensitivity analyses undertaken to demonstrate adequate control of confounding.
Abstract We consider the problem of estimating sparse graphs by a lasso penalty applied to the inverse covariance matrix. Using a coordinate descent procedure for the lasso, we develop a … Abstract We consider the problem of estimating sparse graphs by a lasso penalty applied to the inverse covariance matrix. Using a coordinate descent procedure for the lasso, we develop a simple algorithm—the graphical lasso—that is remarkably fast: It solves a 1000-node problem (∼500000 parameters) in at most a minute and is 30–4000 times faster than competing methods. It also provides a conceptual link between the exact problem and the approximation suggested by Meinshausen and Bühlmann (2006). We illustrate the method on some cell-signaling data from proteomics.
SUMMARY We propose a new method for estimation in linear models. The ‘lasso’ minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients … SUMMARY We propose a new method for estimation in linear models. The ‘lasso’ minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.
I propose a new method for variable selection and shrinkage in Cox's proportional hazards model. My proposal minimizes the log partial likelihood subject to the sum of the absolute values … I propose a new method for variable selection and shrinkage in Cox's proportional hazards model. My proposal minimizes the log partial likelihood subject to the sum of the absolute values of the parameters being bounded by a constant. Because of the nature of this constraint, it shrinks coefficients and produces some coefficients that are exactly zero. As a result it reduces the estimation variance while providing an interpretable final model. The method is a variation of the 'lasso' proposal of Tibshirani, designed for the linear regression context. Simulations indicate that the lasso can be more accurate than stepwise selection in this setting. © 1997 by John Wiley & Sons, Ltd.
This paper proposes an extension of generalized linear models to the analysis of longitudinal data. We introduce a class of estimating equations that give consistent estimates of the regression parameters … This paper proposes an extension of generalized linear models to the analysis of longitudinal data. We introduce a class of estimating equations that give consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence. The estimating equations are derived without specifying the joint distribution of a subject's observations yet they reduce to the score equations for niultivariate Gaussian outcomes. Asymptotic theory is presented for the general class of estimators. Specific cases in which we assume independence, m-dependence and exchangeable correlation structures from each subject are discussed. Efficiency of the pioposecl estimators in two simple situations is considered. The approach is closely related to quasi-likelihood.
Nonparametric density gradient estimation using a generalized kernel approach is investigated. Conditions on the kernel functions are derived to guarantee asymptotic unbiasedness, consistency, and uniform consistency of the estimates. The … Nonparametric density gradient estimation using a generalized kernel approach is investigated. Conditions on the kernel functions are derived to guarantee asymptotic unbiasedness, consistency, and uniform consistency of the estimates. The results are generalized to obtain a simple mcan-shift estimate that can be extended in a <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">k</tex> -nearest-neighbor approach. Applications of gradient estimation to pattern recognition are presented using clustering and intrinsic dimensionality problems, with the ultimate goal of providing further understanding of these problems in terms of density gradients.
The problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion. These … The problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion. These terms are a valid large-sample criterion beyond the Bayesian context, since they do not depend on the a priori distribution.
It is widely known that when there are errors with a moving-average root close to −1, a high order augmented autoregression is necessary for unit root tests to have good … It is widely known that when there are errors with a moving-average root close to −1, a high order augmented autoregression is necessary for unit root tests to have good size, but that information criteria such as the AIC and the BIC tend to select a truncation lag (k) that is very small. We consider a class of Modified Information Criteria (MIC) with a penalty factor that is sample dependent. It takes into account the fact that the bias in the sum of the autoregressive coefficients is highly dependent on k and adapts to the type of deterministic components present. We use a local asymptotic framework in which the moving-average root is local to −1 to document how the MIC performs better in selecting appropriate values of k. In Monte-Carlo experiments, the MIC is found to yield huge size improvements to the DFGLS and the feasible point optimal PT test developed in Elliott, Rothenberg, and Stock (1996). We also extend the M tests developed in Perron and Ng (1996) to allow for GLS detrending of the data. The MIC along with GLS detrended data yield a set of tests with desirable size and power properties.
Praise for the Third Edition. . . an easy-to read introduction to survival analysis which covers the major concepts and techniques of the subject. Statistics in Medical ResearchUpdated and expanded … Praise for the Third Edition. . . an easy-to read introduction to survival analysis which covers the major concepts and techniques of the subject. Statistics in Medical ResearchUpdated and expanded to reflect the latest developments, Statistical Methods for Survival Data Analysis, Fourth Edition continues to deliver a comprehensive introduction to the most commonly-used methods for analyzing survival data. Authored by a uniquely well-qualified author team, the Fourth Edition is a critically acclaimed guide to statistical methods with applications in clinical trials, epidemiology, areas of business, and the social sciences. The book features many real-world examples to illustrate applications within these various fields, although special consideration is given to the study of survival data in biomedical sciences.Emphasizing the latest research and providing the most up-to-date information regarding software applications in the field, Statistical Methods for Survival Data Analysis, Fourth Edition also includes:Marginal and random effect models for analyzing correlated censored or uncensored dataMultiple types of two-sample and K-sample comparison analysisUpdated treatment of parametric methods for regression model fitting with a new focus on accelerated failure time modelsExpanded coverage of the Cox proportional hazards modelExercises at the end of each chapter to deepen knowledge of the presented materialStatistical Methods for Survival Data Analysis is an ideal text for upper-undergraduate and graduate-level courses on survival data analysis. The book is also an excellent resource for biomedical investigators, statisticians, and epidemiologists, as well as researchers in every field in which the analysis of survival data plays a role.
Modern survival analysis and more general event history analysis may be effectively handled in the mathematical framework of counting processes, stochastic integration, martingale central limit theory and product integration. This … Modern survival analysis and more general event history analysis may be effectively handled in the mathematical framework of counting processes, stochastic integration, martingale central limit theory and product integration. This book presents this theory, which has been the subject of an intense research activity during the past one-and-a-half decades. The exposition of the theory is integrated with the careful presentation of many practical examples, based almost exlusively on the authors' experience, with detailed numerical and graphical illustrations. Statistical Models Based on Counting Processes may be viewed as a research monograph for mathematical statisticians and biostatisticians, although almost all methods are given in sufficient detail to be used in practice by other mathematically oriented researchers studying event histories (demographers, econometricians, epidemiologists, actuariala mathematicians, reliability engineers, biologists). Much of the material has so far only been available in the journal literature (if at all), and a wide variety of researchers will find this an invlauable survey of the subject.
Rank Tests for Comparing Two Treatments.- Comparing Two Treatments or Attributes in a Population Model.- Blocked Comparisons for Two Treatments.- Paired Comparisons in a Population Model and the One-Sample Problem.- … Rank Tests for Comparing Two Treatments.- Comparing Two Treatments or Attributes in a Population Model.- Blocked Comparisons for Two Treatments.- Paired Comparisons in a Population Model and the One-Sample Problem.- The Comparison of More Than Two Treatments.- Randomized Complete Blocks.- Tests of Randomness and Independence.
The pattern of zero entries in the inverse covariance matrix of a multivariate normal distribution corresponds to conditional independence restrictions between variables. Covariance selection aims at estimating those structural zeros … The pattern of zero entries in the inverse covariance matrix of a multivariate normal distribution corresponds to conditional independence restrictions between variables. Covariance selection aims at estimating those structural zeros from data. We show that neighborhood selection with the Lasso is a computationally attractive alternative to standard covariance selection for sparse high-dimensional graphs. Neighborhood selection estimates the conditional independence restrictions separately for each node in the graph and is hence equivalent to variable selection for Gaussian linear models. We show that the proposed neighborhood selection scheme is consistent for sparse high-dimensional graphs. Consistency hinges on the choice of the penalty parameter. The oracle value for optimal prediction does not lead to a consistent neighborhood estimate. Controlling instead the probability of falsely joining some distinct connectivity components of the graph, consistent estimation for sparse graphs is achieved (with exponential rates), even when the number of variables grows as the number of observations raised to an arbitrary power.
Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its apparent universality. Many results exist on … Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its apparent universality. Many results exist on the model selection performances of cross-validation procedures. This survey intends to relate these results to the most recent advances of model selection theory, with a particular emphasis on distinguishing empirical statements from rigorous theoretical results. As a conclusion, guidelines are provided for choosing the best cross-validation procedure according to the particular features of the problem in hand.
Researchers have increasingly realized the need to account for within-group dependence in estimating standard errors of regression parameter estimates. The usual solution is to calculate cluster-robust standard errors that permit … Researchers have increasingly realized the need to account for within-group dependence in estimating standard errors of regression parameter estimates. The usual solution is to calculate cluster-robust standard errors that permit heteroskedasticity and within-cluster error correlation, but presume that the number of clusters is large. Standard asymptotic tests can over-reject, however, with few (five to thirty) clusters. We investigate inference using cluster bootstrap-t procedures that provide asymptotic refinement. These procedures are evaluated using Monte Carlos, including the example of Bertrand, Duflo, and Mullainathan (2004). Rejection rates of 10% using standard methods can be reduced to the nominal size of 5% using our methods.
We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros … We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quarter-million dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99% of their (nonfixed) data collection costs or to collect much more meaningful explanatory variables. We provide methods that link these two results, enabling both types of corrections to work simultaneously, and software that implements the methods developed.
Growth curve analysis is a popular method for modeling individual development across time. Specifying growth curve models in a Bayesian framework affords researchers the flexibility of including previous information as … Growth curve analysis is a popular method for modeling individual development across time. Specifying growth curve models in a Bayesian framework affords researchers the flexibility of including previous information as prior distributions of parameters. However, common choices of prior distribution for modeling slope variance in a Bayesian growth curve framework make determining the existence of meaningful interindividual differences in intraindividual change across time difficult due to boundary values of these priors. Additionally, many current methods are either technically difficult to implement or are sensitive to model specification. We present a simple data permutation method that reliably distinguishes between longitudinal data with individual slope variation and those without slope variation. We show situations in that the proposed data permutation testing outperforms DIC based model comparison through Monte Carlo simulations and apply this data permutation method to data derived from the National Longitudinal Study of Adolescent to Adult Health.
In this article multiple regression equations are considered. The study is based on a sample that is influenced by the external environment. This external environment is represented in the form … In this article multiple regression equations are considered. The study is based on a sample that is influenced by the external environment. This external environment is represented in the form of factors that influence the main sample. The sample is divided into parts and a~multiple regression equation is constructed for each part. We construct a mixture of regression equations. There are posed open problems concerning determinination of the coefficients of mixture of nonlinear regression equations via lasso, ridge and elastic regression estimators.
Abstract The path‐specific effect (PSE) is of primary interest in mediation analysis when multiple intermediate variables are in the pathway from treatment to outcome, as it can isolate the specific … Abstract The path‐specific effect (PSE) is of primary interest in mediation analysis when multiple intermediate variables are in the pathway from treatment to outcome, as it can isolate the specific effect through each mediator, thus mitigating potential bias arising from other intermediate variables serving as mediator‐outcome confounders. However, estimation and inference of PSE become challenging in the presence of nonignorable missing covariates, a situation particularly common in studies involving sensitive individual information. This paper proposes a fully nonparametric methodology to address this challenge. We establish identification for PSE by expressing it as a function of observed data. By leveraging a shadow variable, we demonstrate that the associated nuisance functions can be uniquely determined through sequential optimization problems. Then, we propose a sieve‐based regression imputation approach for estimation. We establish the large‐sample theory for the proposed estimator and introduce an approach to make inferences for PSE. The proposed method is applied to the NHANES dataset to investigate the mediation roles of dyslipidemia and obesity in the pathway from Type 2 diabetes mellitus to cardiovascular disease.
In the era of rapid technological advancement and ever-increasing data availability, the field of risk modeling faces both unprecedented challenges and opportunities. Traditional risk modeling approaches, while robust, often struggle … In the era of rapid technological advancement and ever-increasing data availability, the field of risk modeling faces both unprecedented challenges and opportunities. Traditional risk modeling approaches, while robust, often struggle to capture the complexity and dynamic nature of modern risk factors. This paper aims to provide a method for dealing with the insurance pricing problem of pricing predictability and MLOT (Money Left On Table) when writing a book of risks. It also gives an example of how to improve risk selection through suitable choices of machine learning algorithm and acquainted loss function. We apply this methodology to the provided data and discuss the impacts on risk selection and predictive power of the models using the data provided.
<title>Abstract</title> <bold>Background:</bold> Health Data Research Network Canada is tasked with facilitating large-scale health data research, such as statistical analyses that integrate, within a single model, data collected by different organizations, … <title>Abstract</title> <bold>Background:</bold> Health Data Research Network Canada is tasked with facilitating large-scale health data research, such as statistical analyses that integrate, within a single model, data collected by different organizations, each holding distinct subsets of features corresponding to the same individuals, thereby forming a vertical data partition. To support logistic regression analyses in this setting, we assessed two recently proposed algorithms, VERTIGO and VERTIGO-CI, which enable parameter estimation and confidence interval computation, respectively, with respect to three aspects: the risk of re-identifying patient feature data, communication efficiency, and the extent to which model interpretability is preserved. This study has three main objectives: (1) highlighting confidentiality issues that arise with VERTIGO-CI, as well as those that may occur with VERTIGO when a data node holds only binary covariates; (2) reducing the number of required communication rounds; and (3) proposing an alternative (RidgeLog-V) to VERTIGO that excludes the intercept from the penalty term, which VERTIGO otherwise includes. <bold>Methods:</bold> We inspected the quantities exchanged in the original algorithms and used linear algebra to identify reverse-engineering procedures that the coordinating center could employ to reconstruct raw data. We also analyzed the objective function of the optimization problem, leading to the proposal of an alternative formulation that requires only a single round of communication while allowing the intercept to be excluded from the penalty term. <bold>Results:</bold> We showed that, when the VERTIGO-CI algorithm is executed, the coordinating center can reconstruct all individual-level data using simple vector-matrix operations. When the VERTIGO algorithm is executed and a data node has binary covariates only, the coordinating center may be able to recover individual data when parameter estimates are shared. We adapted the VERTIGO algorithm to reduce the number of communications and proposed a variant that excludes the intercept from the penalty term. <bold>Conclusions:</bold> While the use of VERTIGO-CI, or of VERTIGO with binary covariates does not involve directly sharing raw data, confidentiality breaches may arise through reverse-engineering, illustrating that that the distributed nature of an algorithm does not inherently guarantee data privacy. This work also proposed a new algorithm (RidgeLog-V) that reduces operational costs and enhances model interpretability.
Max Sampson , Kung-Sik Chan | Journal of Computational and Graphical Statistics
ABSTRACT This article applies the functional sieve bootstrap (FSB) to estimate the distribution of the partial sum process for time series stemming from a weakly stationary functional process. Consistency of … ABSTRACT This article applies the functional sieve bootstrap (FSB) to estimate the distribution of the partial sum process for time series stemming from a weakly stationary functional process. Consistency of the FSB procedure under weak assumptions on the underlying functional process is established. This result allows for the application of the FSB procedure to testing for a change‐point in the mean of a functional time series using the CUSUM‐statistic. We show that the FSB asymptotically correctly estimates critical values of the CUSUM‐based test under the null‐hypothesis. Consistency of the FSB‐based test under local alternatives is also proven. The finite‐sample performance of the procedure is studied via simulations.
Abstract Conformal prediction provides machine learning models with prediction sets that offer theoretical guarantees, but the underlying assumption of exchangeability limits its applicability to time series data. Furthermore, existing approaches … Abstract Conformal prediction provides machine learning models with prediction sets that offer theoretical guarantees, but the underlying assumption of exchangeability limits its applicability to time series data. Furthermore, existing approaches struggle to handle multi-step ahead prediction tasks, where uncertainty estimates across multiple future time points are crucial. We propose JANET ( J oint A daptive predictio N -region E stimation for T ime-series), a novel framework for constructing conformal prediction regions that are valid for both univariate and multivariate time series. JANET generalises the inductive conformal framework and efficiently produces joint prediction regions with controlled K -familywise error rates, enabling flexible adaptation to specific application needs. Our empirical evaluation demonstrates JANET’s superior performance in multi-step prediction tasks across diverse time series datasets, highlighting its potential for reliable and interpretable uncertainty quantification in sequential data.
We examine how prior specification affects the Bayesian Dirichlet Auto-Regressive Moving Average (B-DARMA) model for compositional time series. Through three simulation scenarios—correct specification, overfitting, and underfitting—we compare five priors: informative, … We examine how prior specification affects the Bayesian Dirichlet Auto-Regressive Moving Average (B-DARMA) model for compositional time series. Through three simulation scenarios—correct specification, overfitting, and underfitting—we compare five priors: informative, horseshoe, Laplace, mixture of normals, and hierarchical. Under correct model specification, all priors perform similarly, although the horseshoe and hierarchical priors produce slightly lower bias. When the model overfits, strong shrinkage—particularly from the horseshoe prior—proves advantageous. However, none of the priors can compensate for model misspecification if key VAR/VMA terms are omitted. We apply B-DARMA to daily S&amp;P 500 sector trading data, using a large-lag model to demonstrate overparameterization risks. Shrinkage priors effectively mitigate spurious complexity, whereas weakly informative priors inflate errors in volatile sectors. These findings highlight the critical role of carefully selecting priors and managing model complexity in compositional time-series analysis, particularly in high-dimensional settings.
Le Chang , Yanlin Shi | Australian & New Zealand Journal of Statistics
ABSTRACT Fused lasso regression is a popular method for identifying homogeneous groups and sparsity patterns in regression coefficients based on either the presumed order or a more general graph structure … ABSTRACT Fused lasso regression is a popular method for identifying homogeneous groups and sparsity patterns in regression coefficients based on either the presumed order or a more general graph structure of the covariates. However, the traditional fused lasso may yield misleading outcomes in the presence of outliers. In this paper, we propose an extension of the fused lasso, namely the robust adaptive fused lasso (RAFL), which pursues homogeneity and sparsity patterns in regression coefficients while accounting for potential outliers within the data. By using Huber's loss or Tukey's biweight loss, RAFL can resist outliers in the responses or in both the responses and the covariates. We also demonstrate that when the adaptive weights are properly chosen, the proposed RAFL achieves consistency in variable selection, consistency in grouping and asymptotic normality. Furthermore, a novel optimization algorithm, which employs the alternating direction method of multipliers, embedded with an accelerated proximal gradient algorithm, is developed to solve RAFL efficiently. Our simulation study shows that RAFL offers substantial improvements in terms of both grouping accuracy and prediction accuracy compared with the fused lasso, particularly when dealing with contaminated data. Additionally, a real analysis of cookie data demonstrates the effectiveness of RAFL.
<title>Abstract</title> Time-varying graphical models provide a powerful framework for capturing the evolving conditional dependencies among high-dimensional variables over time. A widely used method in this context is the Time-Varying Graphical … <title>Abstract</title> Time-varying graphical models provide a powerful framework for capturing the evolving conditional dependencies among high-dimensional variables over time. A widely used method in this context is the Time-Varying Graphical Lasso (TVGL), which estimates a sequence of sparse precision matrices while encouraging temporal smoothness. However, standard TVGL assumes Gaussian-distributed, fully observed data, making it vulnerable to outliers and missing values—common challenges in real-world applications. In this work, we introduce RM-TVGL: a Robust and Missing-Data-Aware Time-Varying Graphical Lasso framework that extends TVGL to accommodate noisy and incomplete data. Our method integrates Huber loss to mitigate the influence of outliers and incorporates an Expectation-Maximization (EM) algorithm to handle missing entries in a principled manner. Additionally, RM-TVGL supports flexible regularization schemes, including ℓ1, ℓ2, and Elastic Net, enabling adaptation to diverse network structures. We develop an efficient ADMMbased optimization algorithm and demonstrate the advantages of RM-TVGL through extensive experiments on both synthetic and real gene expression datasets. The results show that RMTVGL consistently improves structural accuracy, temporal stability, and robustness compared to existing methods.
Förster resonance energy transfer (FRET) is a widely used tool to probe nanometer scale dynamics, projecting rich 3D biomolecular motion onto noisy 1D traces. However, interpretation of FRET traces remains … Förster resonance energy transfer (FRET) is a widely used tool to probe nanometer scale dynamics, projecting rich 3D biomolecular motion onto noisy 1D traces. However, interpretation of FRET traces remains challenging due to degeneracy—distinct structural states map to similar FRET efficiencies—and often suffers from under- and/or over-fitting due to the need to predefine the number of FRET states and noise characteristics. Here we provide a new software, Bayesian nonparametric FRET (BNP-FRET) for binned data obtained from integrative detectors, that eliminates user-dependent parameters and accurately incorporates all known noise sources, enabling the identification of distinct configurations from 1D traces in a plug-n-play manner. Using simulated and experimental data, we demonstrate that BNP-FRET eliminates logistical barrier of predetermining states for each FRET trace and permits high-throughput, simultaneous analysis of a large number of kinetically heterogeneous traces. Furthermore, working in the Bayesian paradigm, BNP-FRET naturally provides uncertainty estimates for all model parameters including the number of states, kinetic rates, and FRET efficiencies.
Motivated by robust and quantile regression problems, we investigate the stochastic gradient descent (SGD) algorithm for minimizing an objective function f that is locally strongly convex with a sub--quadratic tail. … Motivated by robust and quantile regression problems, we investigate the stochastic gradient descent (SGD) algorithm for minimizing an objective function f that is locally strongly convex with a sub--quadratic tail. This setting covers many widely used online statistical methods. We introduce a novel piecewise Lyapunov function that enables us to handle functions f with only first-order differentiability, which includes a wide range of popular loss functions such as Huber loss. Leveraging our proposed Lyapunov function, we derive finite-time moment bounds under general diminishing stepsizes, as well as constant stepsizes. We further establish the weak convergence, central limit theorem and bias characterization under constant stepsize, providing the first geometrical convergence result for sub--quadratic SGD. Our results have wide applications, especially in online statistical methods. In particular, we discuss two applications of our results. 1) Online robust regression: We consider a corrupted linear model with sub--exponential covariates and heavy--tailed noise. Our analysis provides convergence rates comparable to those for corrupted models with Gaussian covariates and noise. 2) Online quantile regression: Importantly, our results relax the common assumption in prior work that the conditional density is continuous and provide a more fine-grained analysis for the moment bounds.
Ting Shen | Journal of Research on Educational Effectiveness
Developments in functional data analysis have attracted considerable attention in recent literature. This study aims to provide general information about functional data analysis and demonstrate how it can be enriched … Developments in functional data analysis have attracted considerable attention in recent literature. This study aims to provide general information about functional data analysis and demonstrate how it can be enriched with auxiliary tools. When the article is considered as a whole, the results predominantly pertain to functional linear models. The first section discusses the estimation of regression model parameters using the Least Squares method. It explains how the Least Squares method is applied within a functional framework and incorporates auxiliary calculations as part of the modeling process. At this stage, the Error Sum of Squares, which forms the basis of the Least Squares method, is represented as a vector field. The second section addresses the interim estimation problem. In this part, the Bernstein polynomial is combined with the wavelet transform to address the interim estimation challenge. The final section introduces various types of functional data analysis. Specifically, the Bernstein polynomial is used in estimating a functional linear model with functional coefficients. Employing the Bernstein polynomial as a model component in the linear model offers a simpler and more innovative approach compared to traditional functional linear model structures. The methods proposed in this study are generally practical and compatible with the classical framework of functional data analysis.
"Small sample size, high dimension" data bring tremendous challenges to epilepsy Electroencephalography (EEG) data analysis and seizure onset prediction. Commonly, sparsity technique is introduced to tackle the problem. In this … "Small sample size, high dimension" data bring tremendous challenges to epilepsy Electroencephalography (EEG) data analysis and seizure onset prediction. Commonly, sparsity technique is introduced to tackle the problem. In this paper, we construct a indicator matrix acting as prior knowledge to assist logistic regression model with group lasso penalty to implement seizure prediction. The proposed method selects the feature at the group level, and it achieves the seizure prediction based on the important feature groups, recognizes the unknown clusters properly and performs well for both synthetic data following Bernoulli distribution and dataset CHB-MIT.
Unmeasured confounding is a major concern in obtaining credible inferences about causal effects from observational data. Proximal causal inference is an emerging methodological framework to detect and potentially account for … Unmeasured confounding is a major concern in obtaining credible inferences about causal effects from observational data. Proximal causal inference is an emerging methodological framework to detect and potentially account for confounding bias by carefully leveraging a pair of negative control exposure and outcome variables, also known as treatment and outcome confounding proxies. Although regression-based proximal causal inference is well-developed for binary and continuous outcomes, analogous proximal causal inference regression methods for right-censored time-to-event outcomes are currently lacking. In this paper, we propose a novel two-stage regression proximal causal inference approach for right-censored survival data under an additive hazard structural model. We provide theoretical justification for the proposed approach tailored to different types of negative control outcomes, including continuous, count, and right-censored time-to-event variables. We illustrate the approach with an evaluation of the effectiveness of right heart catheterization among critically ill patients using data from the SUPPORT study. Our method is implemented in the open-access R package “pci2s.”
Partially linear time series models often suffer from multicollinearity among regressors and autocorrelated errors, both of which can inflate estimation risk. This study introduces a generalized ridge-type kernel (GRTK) framework … Partially linear time series models often suffer from multicollinearity among regressors and autocorrelated errors, both of which can inflate estimation risk. This study introduces a generalized ridge-type kernel (GRTK) framework that combines kernel smoothing with ridge shrinkage and augments it through ordinary and positive-part Stein adjustments. Closed-form expressions and large-sample properties are established, and data-driven criteria—including GCV, AICc, BIC, and RECP—are used to tune the bandwidth and shrinkage penalties. Monte-Carlo simulations indicate that the proposed procedures usually reduce risk relative to existing semiparametric alternatives, particularly when the predictors are strongly correlated and the error process is dependent. An empirical study of US airline-delay data further demonstrates that GRTK produces a stable, interpretable fit, captures a nonlinear air-time effect overlooked by conventional approaches, and leaves only a modest residual autocorrelation. By tackling multicollinearity and autocorrelation within a single, flexible estimator, the GRTK family offers practitioners a practical avenue for more reliable inference in partially linear time series settings.
Mengyu Zhang , Chao Huang , Fengchang Xie | Journal of Statistical Computation and Simulation
We study the uniform convergence rates of nonparametric estimators for a probability density function and its derivatives when the density has a known pole. Such situations arise in some structural … We study the uniform convergence rates of nonparametric estimators for a probability density function and its derivatives when the density has a known pole. Such situations arise in some structural microeconometric models, for example, in auction, labor, and consumer search, where uniform convergence rates of density functions are important for nonparametric and semiparametric estimation. Existing uniform convergence rates based on Rosenblatt’s kernel estimator are derived under the assumption that the density is bounded. They are not applicable when there is a pole in the density. We treat the pole nonparametrically and show various kernel-based estimators can attain any convergence rate that is slower than the optimal rate when the density is bounded uniformly over an appropriately expanding support under mild conditions.
Simulation studies are used to evaluate and compare the properties of statistical methods in controlled experimental settings. In most cases, performing a simulation study requires knowledge of the true value … Simulation studies are used to evaluate and compare the properties of statistical methods in controlled experimental settings. In most cases, performing a simulation study requires knowledge of the true value of the parameter, or estimand, of interest. However, in many simulation designs, the true value of the estimand is difficult to compute analytically. Here, we illustrate the use of Monte Carlo integration to compute true estimand values in simple and more complex simulation designs. We provide general pseudocode that can be replicated in any software program of choice to demonstrate key principles in using Monte Carlo integration in two scenarios: a simple three-variable simulation where interest lies in the marginally adjusted odds ratio and a more complex causal mediation analysis where interest lies in the controlled direct effect in the presence of mediator-outcome confounders affected by the exposure. We discuss general strategies that can be used to minimize Monte Carlo error and to serve as checks on the simulation program to avoid coding errors. R programming code is provided illustrating the application of our pseudocode in these settings.
Thank you for the opportunity to reply to the comment raised by Hårdemark [...] Thank you for the opportunity to reply to the comment raised by Hårdemark [...]
Summary Traditional statistical and machine learning methods typically assume that the training and test data follow the same distribution. However, this assumption is frequently violated in real-world applications, where the … Summary Traditional statistical and machine learning methods typically assume that the training and test data follow the same distribution. However, this assumption is frequently violated in real-world applications, where the training data in the source domain may under-represent specific subpopulations in the test data of the target domain. This paper addresses target-independent learning under covariate shift, focusing on multicalibration for survival probability and restricted mean survival time. A black-box post-processing boosting algorithm specifically designed for censored survival data is introduced. By leveraging pseudo-observations, our method produces a multicalibrated predictor that is competitive with inverse propensity score weighting in predicting the survival outcome in an unlabeled target domain, ensuring not only overall accuracy but also fairness across diverse subpopulations. Our theoretical analysis of pseudo-observations builds upon the functional delta method and the p-variational norm. The algorithm’s sample complexity, convergence properties, and multicalibration guarantees for post-processed predictors are provided. Our results establish a fundamental connection between multicalibration and universal adaptability, demonstrating that our calibrated function is comparable to, or outperforms, the inverse propensity score weighting estimator. Extensive numerical simulations and a real-world case study on cardiovascular disease risk prediction using two large prospective cohort studies validate the effectiveness of our approach.