Author Description

Login to generate an author description

Ask a Question About This Mathematician

Summary The lasso penalizes a least squares regression by the sum of the absolute values (L1-norm) of the coefficients. The form of this penalty encourages sparse solutions (with many coefficients … Summary The lasso penalizes a least squares regression by the sum of the absolute values (L1-norm) of the coefficients. The form of this penalty encourages sparse solutions (with many coefficients equal to 0). We propose the ā€˜fused lasso’, a generalization that is designed for problems with features that can be ordered in some meaningful way. The fused lasso penalizes the L1-norm of both the coefficients and their successive differences. Thus it encourages sparsity of the coefficients and also sparsity of their differences—i.e. local constancy of the coefficient profile. The fused lasso is especially useful when the number of features p is much greater than N, the sample size. The technique is also extended to the ā€˜hinge’ loss function that underlies the support vector classifier. We illustrate the methods on examples from protein mass spectroscopy and gene expression data.
We consider the asymptotic behavior ofregression estimators that minimize the residual sum of squares plus a penalty proportional to $\sum|\beta_j|^{\gamma}$. for some $\gamma > 0$. These estimators include the Lasso … We consider the asymptotic behavior ofregression estimators that minimize the residual sum of squares plus a penalty proportional to $\sum|\beta_j|^{\gamma}$. for some $\gamma > 0$. These estimators include the Lasso as a special case when $\gamma = 1$. Under appropriate conditions, we show that the limiting distributions can have positive probability mass at 0 when the true value of the parameter is 0.We also consider asymptotics for ā€œnearly singularā€ designs.
It is well known that $L_1$-estimators of regression parameters are asymptotically normal if the distribution function has a positive derivative at 0. In this paper, we derive the asymptotic distributions … It is well known that $L_1$-estimators of regression parameters are asymptotically normal if the distribution function has a positive derivative at 0. In this paper, we derive the asymptotic distributions under more general conditions on the behavior of the distribution function near 0.
Summary We propose a new criterion for model selection in prediction problems. The covariance inflation criterion adjusts the training error by the average covariance of the predictions and responses, when … Summary We propose a new criterion for model selection in prediction problems. The covariance inflation criterion adjusts the training error by the average covariance of the predictions and responses, when the prediction rule is applied to permuted versions of the data set. This criterion can be applied to general prediction problems (e.g. regression or classification) and to general prediction rules (e.g. stepwise regression, tree-based models and neural nets). As a by-product we obtain a measure of the effective number of parameters used by an adaptive procedure. We relate the covariance inflation criterion to other model selection procedures and illustrate its use in some regression and classification problems. We also revisit the conditional bootstrap approach to model selection.
Abstract We consider the asymptotic behaviour of least‐squares and M ‐estimates of the autoregressive parameter when the process is an infinite‐variance random walk. It is shown that certain M ‐estimates … Abstract We consider the asymptotic behaviour of least‐squares and M ‐estimates of the autoregressive parameter when the process is an infinite‐variance random walk. It is shown that certain M ‐estimates converge faster than least‐squares estimates and that they are also asymptotically normal.
Athreya showed that the bootstrap distribution of a sum of infinite variance random variables did not (with probability 1) tend weakly to a fixed distribution but instead tended in distribution … Athreya showed that the bootstrap distribution of a sum of infinite variance random variables did not (with probability 1) tend weakly to a fixed distribution but instead tended in distribution to a random distribution. In this paper, we give a different proof of Athreya's result motivated by a heuristic large sample representation of the bootstrap distribution.
Abstract We propose a bootstrap-based method for enhancing a search through a space of models. The technique is well suited to complex, adaptively fitted models—it provides a convenient method for … Abstract We propose a bootstrap-based method for enhancing a search through a space of models. The technique is well suited to complex, adaptively fitted models—it provides a convenient method for finding better local minima and for resistant fitting. Applications to regression, classification, and density estimation are described. We also provide results on the asymptotic behavior of bumping estimates.
We consider the limiting distributions of M -estimates of an ā€œautoregressiveā€ parameter when the observations come from an integrated linear process with infinite variance innovations. It is shown that M … We consider the limiting distributions of M -estimates of an ā€œautoregressiveā€ parameter when the observations come from an integrated linear process with infinite variance innovations. It is shown that M -estimates are, asymptotically, infinitely more efficient than the least-squares estimator (in the sense that they have a faster rate of convergence) and are conditionally asymptotically normal.
This paper considers the asymptotic behavior of M -estimates in a dynamic linear regression model where the errors have infinite second moments but the exogenous regressors satisfy the standard assumptions. … This paper considers the asymptotic behavior of M -estimates in a dynamic linear regression model where the errors have infinite second moments but the exogenous regressors satisfy the standard assumptions. It is shown that under certain conditions, the estimates of the parameters corresponding to the exogenous regressors are asymptotically normal and converge to the true values at the standard n āˆ’Ā½ rate.
Abstract. Let Y n =μ+Σβ j ( Y n–j –μ) +ɛ n be a p th order autoregressive process with innovations {ɛ n } in the domain of attraction of … Abstract. Let Y n =μ+Σβ j ( Y n–j –μ) +ɛ n be a p th order autoregressive process with innovations {ɛ n } in the domain of attraction of a stable law with index α < 2. Hannan and Kanter (1977) showed that when the location parameter μ is known that least squares estimates of the autoregressive parameters have a very fast rate of convergence; specifically, N 1/Ī“ (β j –β j ) → O almost surely for Ī“ > α. It is shown here that if α is estimated by the sample mean, N 1/Ī“ (β j –β j ) → O almost surely for Ī“ > max(1, α). In addition, some statements are made regarding estimators of α which will give the full (Hannan and Kanter) rate of convergence, in particular when α < 1.
Suppose $\{X_n\}$ is a $p$th order autoregressive process with innovations in the domain of attraction of a stable law and the true order $p$ unknown. The estimate $\hat{p}$ of $p$ … Suppose $\{X_n\}$ is a $p$th order autoregressive process with innovations in the domain of attraction of a stable law and the true order $p$ unknown. The estimate $\hat{p}$ of $p$ is chosen to minimize Akaike's information criterion over the integers $0, 1, \cdots, K$. It is shown that $\hat{p}$ is weakly consistent and the consistency is retained if $K \rightarrow \infty$ as $N \rightarrow \infty$ at a certain rate depending on the index of the stable law.
Abstract We consider the asymptotic behaviour of L 1 ‐estimators in a linear regression under a very general form of heteroscedasticity. The limiting distributions of the estimators are derived under … Abstract We consider the asymptotic behaviour of L 1 ‐estimators in a linear regression under a very general form of heteroscedasticity. The limiting distributions of the estimators are derived under standard conditions on the design. We also consider the asymptotic behaviour of the bootstrap in the heteroscedastic model and show that it is consistent to first order only if the limiting distribution is normal.
Shrinkage estimation procedures such as ridge regression and the lasso have been proposed for stabilizing estimation in linear models when high collinearity exists in the design. In this paper, we … Shrinkage estimation procedures such as ridge regression and the lasso have been proposed for stabilizing estimation in linear models when high collinearity exists in the design. In this paper, we consider asymptotic properties of shrinkage estimators in the case of ā€œnearly singularā€ designs.I thank Hannes Leeb and Benedikt Pƶtscher and also the referees for their valuable comments. This research was supported by a grant from the Natural Sciences and Engineering Research Council of Canada.
Statistical folklore says that sample quantiles from an i.i.d. sample are asymptotically normal. Although this is true under certain conditions, the asymptotic theory of sample quantiles is much richer. In … Statistical folklore says that sample quantiles from an i.i.d. sample are asymptotically normal. Although this is true under certain conditions, the asymptotic theory of sample quantiles is much richer. In this paper, some of the possibilities are explored.
It is well known that conventional Wald-type inference in the context of quantile regression is complicated by the need to construct estimates of the conditional densities of the response variables … It is well known that conventional Wald-type inference in the context of quantile regression is complicated by the need to construct estimates of the conditional densities of the response variables at the quantile of interest. This note explores the possibility of circumventing the need to construct conditional density estimates in this context with scale statistics that are explicitly inconsistent for the underlying conditional densities. This method of studentization leads conventional test statistics to have limiting distributions that are nonstandard but have the convenient feature of depending explicitly on the user’s choice of smoothing parameter. These limiting distributions depend on the distribution of the conditioning variables but can be straightforwardly approximated by resampling.
It is well-known that L\-estimators of autogressive parameters are asymptotically Normal if the distribution function of the errors, F(x), has ^'(0) = Ī» > 0. In this paper, we derive … It is well-known that L\-estimators of autogressive parameters are asymptotically Normal if the distribution function of the errors, F(x), has ^'(0) = Ī» > 0. In this paper, we derive limiting distributions of L\-estimators under more general assumptions on F. Second-order representations are also derived.
We consider some asymptotic distribution theory for M-estimators of the parameters of a linear model whose errors are non-negative; these estimators are the solutions of constrained optimization problems and their … We consider some asymptotic distribution theory for M-estimators of the parameters of a linear model whose errors are non-negative; these estimators are the solutions of constrained optimization problems and their asymptotic theory is non-standard. Under weak conditions on the distribution of the errors and on the design, we show that a large class of estimators have the same asymptotic distributions in the case of i.i.d. errors; however, this invariance does not hold under non-i.i.d. errors.
This paper introduces a novel way of differentiating a unit root from a stationary alternative. We write up the model consisting of zero and nonzero parameters. If the lagged dependent … This paper introduces a novel way of differentiating a unit root from a stationary alternative. We write up the model consisting of zero and nonzero parameters. If the lagged dependent variable has a coefficient of zero, we know that the variable has a unit root. We exploit this property and treat this as a model selection problem. We show that Bridge estimators can select the correct model. They estimate zero parameter on the lagged dependent variable as zero (nonstationarity), if this is nonzero (stationary), estimate the coefficient with standard normal limit. In this sense, we extend the statistics literature as well, since that literature only deals with model selection among only stationary variables. The reason that our methodology can outperform the existing unit root tests with lag selection methods stems from the two-step nature of existing unit root tests. In our method, we select the optimal lag length and unit root simultaneously. We show that in simulations, this makes big difference in terms of size and power.
We consider an approach to deriving Bahadur–Kiefer theorems based on a ā€œdelta methodā€ for sequences of minimizers. This approach is used to derive Bahadur–Kiefer theorems for the sample median and … We consider an approach to deriving Bahadur–Kiefer theorems based on a ā€œdelta methodā€ for sequences of minimizers. This approach is used to derive Bahadur–Kiefer theorems for the sample median and other estimators.
It is well-known that conventional Wald-type inference in the context of quantile regression is complicated by the need to construct estimates of the conditional densities of the response variables at … It is well-known that conventional Wald-type inference in the context of quantile regression is complicated by the need to construct estimates of the conditional densities of the response variables at the quantile of interest. This note explores the possibility of circumventing the need to construct conditional density estimates in this context with scale statistics that are explicitly inconsistent for the underlying conditional densities. This method of Studentization leads conventional test statistics to have limiting distributions that are nonstandard but have the convenient feature of depending explicitly on the user's choice of smoothing parameter. These limiting distributions depend on the distribution of the conditioning variables but can be straightforwardly approximated by resampling.
In a linear regression model, the Dantzig selector (CandĆØs and Tao, 2007 CandĆØs, E., Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Annals … In a linear regression model, the Dantzig selector (CandĆØs and Tao, 2007 CandĆØs, E., Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Annals of Statistics 35:2313–2351.[Crossref], [Web of Science Ā®] , [Google Scholar]) minimizes the L1 norm of the regression coefficients subject to a bound Ī» on the Lāˆž norm of the covariances between the predictors and the residuals; the resulting estimator is the solution of a linear program, which may be nonunique or unstable. We propose a regularized alternative to the Dantzig selector. These estimators (which depend on Ī» and an additional tuning parameter r) minimize objective functions that are the sum of the L1 norm of the regression coefficients plus r times the logarithmic potential function of the Dantzig selector constraints, and can be viewed as penalized analytic centers of the latter constraints. The tuning parameter r controls the smoothness of the estimators as functions of Ī» and, when Ī» is sufficiently large, the estimators depend approximately on r and Ī» via r/Ī»2.
Abstract The LASSO (least absolute shrinkage and selection operator) is a method of estimation in linear regression (and related) models that combines shrinkage estimation achieved in ridge regression with model … Abstract The LASSO (least absolute shrinkage and selection operator) is a method of estimation in linear regression (and related) models that combines shrinkage estimation achieved in ridge regression with model selection where some (or all) parameter estimates are set to zero. The LASSO is capable of producing a ā€œsparseā€ estimate of the parameter and can be used effectively even when the number of parameters exceeds the number of observations. LASSO estimates can be computed very efficiently using a number of methods, including coordinate descent methods. A number of generalizations of the LASSO have been proposed, including the fused LASSO, the group LASSO, and the adaptive LASSO.
<!-- *** Custom HTML *** --> The analytic center estimator is defined as the analytic center of the so-called membership set. In this paper, we consider the asymptotics of this … <!-- *** Custom HTML *** --> The analytic center estimator is defined as the analytic center of the so-called membership set. In this paper, we consider the asymptotics of this estimator under fairly general assumptions on the noise distribution.
Abstract The LASSO (least absolute shrinkage and selection operator) is a method of estimation in linear regression (and related) models that combines shrinkage estimation achieved in ridge regression with model … Abstract The LASSO (least absolute shrinkage and selection operator) is a method of estimation in linear regression (and related) models that combines shrinkage estimation achieved in ridge regression with model selection where some (or all) parameter estimates are set to zero. The LASSO is capable of producing a ā€œsparseā€ estimate of the parameter and can be used effectively even when the number of parameters exceeds the number of observations. LASSO estimates can be computed very efficiently using a number of methods, including coordinate descent methods. A number of generalizations of the LASSO have been proposed, including the fused LASSO, the group LASSO, and the adaptive LASSO.
In a linear regression model, the Dantzig selector (CandĆØs and Tao, 2007 CandĆØs, E., Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Annals … In a linear regression model, the Dantzig selector (CandĆØs and Tao, 2007 CandĆØs, E., Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Annals of Statistics 35:2313–2351.[Crossref], [Web of Science Ā®] , [Google Scholar]) minimizes the L1 norm of the regression coefficients subject to a bound Ī» on the Lāˆž norm of the covariances between the predictors and the residuals; the resulting estimator is the solution of a linear program, which may be nonunique or unstable. We propose a regularized alternative to the Dantzig selector. These estimators (which depend on Ī» and an additional tuning parameter r) minimize objective functions that are the sum of the L1 norm of the regression coefficients plus r times the logarithmic potential function of the Dantzig selector constraints, and can be viewed as penalized analytic centers of the latter constraints. The tuning parameter r controls the smoothness of the estimators as functions of Ī» and, when Ī» is sufficiently large, the estimators depend approximately on r and Ī» via r/Ī»2.
Abstract The LASSO (least absolute shrinkage and selection operator) is a method of estimation in linear regression (and related) models that combines shrinkage estimation achieved in ridge regression with model … Abstract The LASSO (least absolute shrinkage and selection operator) is a method of estimation in linear regression (and related) models that combines shrinkage estimation achieved in ridge regression with model selection where some (or all) parameter estimates are set to zero. The LASSO is capable of producing a ā€œsparseā€ estimate of the parameter and can be used effectively even when the number of parameters exceeds the number of observations. LASSO estimates can be computed very efficiently using a number of methods, including coordinate descent methods. A number of generalizations of the LASSO have been proposed, including the fused LASSO, the group LASSO, and the adaptive LASSO.
Abstract The LASSO (least absolute shrinkage and selection operator) is a method of estimation in linear regression (and related) models that combines shrinkage estimation achieved in ridge regression with model … Abstract The LASSO (least absolute shrinkage and selection operator) is a method of estimation in linear regression (and related) models that combines shrinkage estimation achieved in ridge regression with model selection where some (or all) parameter estimates are set to zero. The LASSO is capable of producing a ā€œsparseā€ estimate of the parameter and can be used effectively even when the number of parameters exceeds the number of observations. LASSO estimates can be computed very efficiently using a number of methods, including coordinate descent methods. A number of generalizations of the LASSO have been proposed, including the fused LASSO, the group LASSO, and the adaptive LASSO.
<!-- *** Custom HTML *** --> The analytic center estimator is defined as the analytic center of the so-called membership set. In this paper, we consider the asymptotics of this … <!-- *** Custom HTML *** --> The analytic center estimator is defined as the analytic center of the so-called membership set. In this paper, we consider the asymptotics of this estimator under fairly general assumptions on the noise distribution.
It is well known that conventional Wald-type inference in the context of quantile regression is complicated by the need to construct estimates of the conditional densities of the response variables … It is well known that conventional Wald-type inference in the context of quantile regression is complicated by the need to construct estimates of the conditional densities of the response variables at the quantile of interest. This note explores the possibility of circumventing the need to construct conditional density estimates in this context with scale statistics that are explicitly inconsistent for the underlying conditional densities. This method of studentization leads conventional test statistics to have limiting distributions that are nonstandard but have the convenient feature of depending explicitly on the user’s choice of smoothing parameter. These limiting distributions depend on the distribution of the conditioning variables but can be straightforwardly approximated by resampling.
This paper introduces a novel way of differentiating a unit root from a stationary alternative. We write up the model consisting of zero and nonzero parameters. If the lagged dependent … This paper introduces a novel way of differentiating a unit root from a stationary alternative. We write up the model consisting of zero and nonzero parameters. If the lagged dependent variable has a coefficient of zero, we know that the variable has a unit root. We exploit this property and treat this as a model selection problem. We show that Bridge estimators can select the correct model. They estimate zero parameter on the lagged dependent variable as zero (nonstationarity), if this is nonzero (stationary), estimate the coefficient with standard normal limit. In this sense, we extend the statistics literature as well, since that literature only deals with model selection among only stationary variables. The reason that our methodology can outperform the existing unit root tests with lag selection methods stems from the two-step nature of existing unit root tests. In our method, we select the optimal lag length and unit root simultaneously. We show that in simulations, this makes big difference in terms of size and power.
Shrinkage estimation procedures such as ridge regression and the lasso have been proposed for stabilizing estimation in linear models when high collinearity exists in the design. In this paper, we … Shrinkage estimation procedures such as ridge regression and the lasso have been proposed for stabilizing estimation in linear models when high collinearity exists in the design. In this paper, we consider asymptotic properties of shrinkage estimators in the case of ā€œnearly singularā€ designs.I thank Hannes Leeb and Benedikt Pƶtscher and also the referees for their valuable comments. This research was supported by a grant from the Natural Sciences and Engineering Research Council of Canada.
We consider some asymptotic distribution theory for M-estimators of the parameters of a linear model whose errors are non-negative; these estimators are the solutions of constrained optimization problems and their … We consider some asymptotic distribution theory for M-estimators of the parameters of a linear model whose errors are non-negative; these estimators are the solutions of constrained optimization problems and their asymptotic theory is non-standard. Under weak conditions on the distribution of the errors and on the design, we show that a large class of estimators have the same asymptotic distributions in the case of i.i.d. errors; however, this invariance does not hold under non-i.i.d. errors.
It is well-known that conventional Wald-type inference in the context of quantile regression is complicated by the need to construct estimates of the conditional densities of the response variables at … It is well-known that conventional Wald-type inference in the context of quantile regression is complicated by the need to construct estimates of the conditional densities of the response variables at the quantile of interest. This note explores the possibility of circumventing the need to construct conditional density estimates in this context with scale statistics that are explicitly inconsistent for the underlying conditional densities. This method of Studentization leads conventional test statistics to have limiting distributions that are nonstandard but have the convenient feature of depending explicitly on the user's choice of smoothing parameter. These limiting distributions depend on the distribution of the conditioning variables but can be straightforwardly approximated by resampling.
Summary The lasso penalizes a least squares regression by the sum of the absolute values (L1-norm) of the coefficients. The form of this penalty encourages sparse solutions (with many coefficients … Summary The lasso penalizes a least squares regression by the sum of the absolute values (L1-norm) of the coefficients. The form of this penalty encourages sparse solutions (with many coefficients equal to 0). We propose the ā€˜fused lasso’, a generalization that is designed for problems with features that can be ordered in some meaningful way. The fused lasso penalizes the L1-norm of both the coefficients and their successive differences. Thus it encourages sparsity of the coefficients and also sparsity of their differences—i.e. local constancy of the coefficient profile. The fused lasso is especially useful when the number of features p is much greater than N, the sample size. The technique is also extended to the ā€˜hinge’ loss function that underlies the support vector classifier. We illustrate the methods on examples from protein mass spectroscopy and gene expression data.
Statistical folklore says that sample quantiles from an i.i.d. sample are asymptotically normal. Although this is true under certain conditions, the asymptotic theory of sample quantiles is much richer. In … Statistical folklore says that sample quantiles from an i.i.d. sample are asymptotically normal. Although this is true under certain conditions, the asymptotic theory of sample quantiles is much richer. In this paper, some of the possibilities are explored.
We consider the asymptotic behavior ofregression estimators that minimize the residual sum of squares plus a penalty proportional to $\sum|\beta_j|^{\gamma}$. for some $\gamma > 0$. These estimators include the Lasso … We consider the asymptotic behavior ofregression estimators that minimize the residual sum of squares plus a penalty proportional to $\sum|\beta_j|^{\gamma}$. for some $\gamma > 0$. These estimators include the Lasso as a special case when $\gamma = 1$. Under appropriate conditions, we show that the limiting distributions can have positive probability mass at 0 when the true value of the parameter is 0.We also consider asymptotics for ā€œnearly singularā€ designs.
Abstract We propose a bootstrap-based method for enhancing a search through a space of models. The technique is well suited to complex, adaptively fitted models—it provides a convenient method for … Abstract We propose a bootstrap-based method for enhancing a search through a space of models. The technique is well suited to complex, adaptively fitted models—it provides a convenient method for finding better local minima and for resistant fitting. Applications to regression, classification, and density estimation are described. We also provide results on the asymptotic behavior of bumping estimates.
Abstract We consider the asymptotic behaviour of L 1 ‐estimators in a linear regression under a very general form of heteroscedasticity. The limiting distributions of the estimators are derived under … Abstract We consider the asymptotic behaviour of L 1 ‐estimators in a linear regression under a very general form of heteroscedasticity. The limiting distributions of the estimators are derived under standard conditions on the design. We also consider the asymptotic behaviour of the bootstrap in the heteroscedastic model and show that it is consistent to first order only if the limiting distribution is normal.
Summary We propose a new criterion for model selection in prediction problems. The covariance inflation criterion adjusts the training error by the average covariance of the predictions and responses, when … Summary We propose a new criterion for model selection in prediction problems. The covariance inflation criterion adjusts the training error by the average covariance of the predictions and responses, when the prediction rule is applied to permuted versions of the data set. This criterion can be applied to general prediction problems (e.g. regression or classification) and to general prediction rules (e.g. stepwise regression, tree-based models and neural nets). As a by-product we obtain a measure of the effective number of parameters used by an adaptive procedure. We relate the covariance inflation criterion to other model selection procedures and illustrate its use in some regression and classification problems. We also revisit the conditional bootstrap approach to model selection.
We consider an approach to deriving Bahadur–Kiefer theorems based on a ā€œdelta methodā€ for sequences of minimizers. This approach is used to derive Bahadur–Kiefer theorems for the sample median and … We consider an approach to deriving Bahadur–Kiefer theorems based on a ā€œdelta methodā€ for sequences of minimizers. This approach is used to derive Bahadur–Kiefer theorems for the sample median and other estimators.
It is well known that $L_1$-estimators of regression parameters are asymptotically normal if the distribution function has a positive derivative at 0. In this paper, we derive the asymptotic distributions … It is well known that $L_1$-estimators of regression parameters are asymptotically normal if the distribution function has a positive derivative at 0. In this paper, we derive the asymptotic distributions under more general conditions on the behavior of the distribution function near 0.
It is well-known that L\-estimators of autogressive parameters are asymptotically Normal if the distribution function of the errors, F(x), has ^'(0) = Ī» > 0. In this paper, we derive … It is well-known that L\-estimators of autogressive parameters are asymptotically Normal if the distribution function of the errors, F(x), has ^'(0) = Ī» > 0. In this paper, we derive limiting distributions of L\-estimators under more general assumptions on F. Second-order representations are also derived.
This paper considers the asymptotic behavior of M -estimates in a dynamic linear regression model where the errors have infinite second moments but the exogenous regressors satisfy the standard assumptions. … This paper considers the asymptotic behavior of M -estimates in a dynamic linear regression model where the errors have infinite second moments but the exogenous regressors satisfy the standard assumptions. It is shown that under certain conditions, the estimates of the parameters corresponding to the exogenous regressors are asymptotically normal and converge to the true values at the standard n āˆ’Ā½ rate.
We consider the limiting distributions of M -estimates of an ā€œautoregressiveā€ parameter when the observations come from an integrated linear process with infinite variance innovations. It is shown that M … We consider the limiting distributions of M -estimates of an ā€œautoregressiveā€ parameter when the observations come from an integrated linear process with infinite variance innovations. It is shown that M -estimates are, asymptotically, infinitely more efficient than the least-squares estimator (in the sense that they have a faster rate of convergence) and are conditionally asymptotically normal.
Athreya showed that the bootstrap distribution of a sum of infinite variance random variables did not (with probability 1) tend weakly to a fixed distribution but instead tended in distribution … Athreya showed that the bootstrap distribution of a sum of infinite variance random variables did not (with probability 1) tend weakly to a fixed distribution but instead tended in distribution to a random distribution. In this paper, we give a different proof of Athreya's result motivated by a heuristic large sample representation of the bootstrap distribution.
Abstract We consider the asymptotic behaviour of least‐squares and M ‐estimates of the autoregressive parameter when the process is an infinite‐variance random walk. It is shown that certain M ‐estimates … Abstract We consider the asymptotic behaviour of least‐squares and M ‐estimates of the autoregressive parameter when the process is an infinite‐variance random walk. It is shown that certain M ‐estimates converge faster than least‐squares estimates and that they are also asymptotically normal.
Suppose $\{X_n\}$ is a $p$th order autoregressive process with innovations in the domain of attraction of a stable law and the true order $p$ unknown. The estimate $\hat{p}$ of $p$ … Suppose $\{X_n\}$ is a $p$th order autoregressive process with innovations in the domain of attraction of a stable law and the true order $p$ unknown. The estimate $\hat{p}$ of $p$ is chosen to minimize Akaike's information criterion over the integers $0, 1, \cdots, K$. It is shown that $\hat{p}$ is weakly consistent and the consistency is retained if $K \rightarrow \infty$ as $N \rightarrow \infty$ at a certain rate depending on the index of the stable law.
Abstract. Let Y n =μ+Σβ j ( Y n–j –μ) +ɛ n be a p th order autoregressive process with innovations {ɛ n } in the domain of attraction of … Abstract. Let Y n =μ+Σβ j ( Y n–j –μ) +ɛ n be a p th order autoregressive process with innovations {ɛ n } in the domain of attraction of a stable law with index α &lt; 2. Hannan and Kanter (1977) showed that when the location parameter μ is known that least squares estimates of the autoregressive parameters have a very fast rate of convergence; specifically, N 1/Ī“ (β j –β j ) → O almost surely for Ī“ &gt; α. It is shown here that if α is estimated by the sample mean, N 1/Ī“ (β j –β j ) → O almost surely for Ī“ &gt; max(1, α). In addition, some statements are made regarding estimators of α which will give the full (Hannan and Kanter) rate of convergence, in particular when α &lt; 1.
We consider the asymptotic behavior ofregression estimators that minimize the residual sum of squares plus a penalty proportional to $\sum|\beta_j|^{\gamma}$. for some $\gamma > 0$. These estimators include the Lasso … We consider the asymptotic behavior ofregression estimators that minimize the residual sum of squares plus a penalty proportional to $\sum|\beta_j|^{\gamma}$. for some $\gamma > 0$. These estimators include the Lasso as a special case when $\gamma = 1$. Under appropriate conditions, we show that the limiting distributions can have positive probability mass at 0 when the true value of the parameter is 0.We also consider asymptotics for ā€œnearly singularā€ designs.
SUMMARY We propose a new method for estimation in linear models. The ā€˜lasso’ minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients … SUMMARY We propose a new method for estimation in linear models. The ā€˜lasso’ minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.
Abstract In the general linear model with independent and identically distributed errors and distribution function F, the estimator which minimizes the sum of absolute residuals is demonstrated to be consistent … Abstract In the general linear model with independent and identically distributed errors and distribution function F, the estimator which minimizes the sum of absolute residuals is demonstrated to be consistent and asymptotically Gaussian with covariance matrix ω2 Q -1, where Q = lim T -1 X'X and ω2 is the asymptotic variance of the ordinary sample median from samples with distribution F. Thus the least absolute error estimator has strictly smaller asymptotic confidence ellipsoids than the least squares estimator for linear models from any F for which the sample median is a more efficient estimator of location than the sample mean.
The LAD estimator of the vector parameter in a linear regression is defined by minimizing the sum of the absolute values of the residuals. This paper provides a direct proof … The LAD estimator of the vector parameter in a linear regression is defined by minimizing the sum of the absolute values of the residuals. This paper provides a direct proof of asymptotic normality for the LAD estimator. The main theorem assumes deterministic carriers. The extension to random carriers includes the case of autoregressions whose error terms have finite second moments. For a first-order autoregression with Cauchy errors the LAD estimator is shown to converge at a 1/ n rate.
Limit theorems for an $M$-estimate constrained to lie in a closed subset of $\mathbb{R}^d$ are given under two different sets of regularity conditions. A consistent sequence of global optimizers converges … Limit theorems for an $M$-estimate constrained to lie in a closed subset of $\mathbb{R}^d$ are given under two different sets of regularity conditions. A consistent sequence of global optimizers converges under Chernoff regularity of the parameter set. A $\sqrt n$-consistent sequence of local optimizers converges under Clarke regularity of the parameter set. In either case the asymptotic distribution is a projection of a normal random vector on the tangent cone of the parameter set at the true parameter value. Limit theorems for the optimal value are also obtained, agreeing with Chernoff's result in the case of maximum likelihood with global optimizers.
SUMMARY With ideal spatial adaptation, an oracle furnishes information about how best to adapt a spatially variable estimator, whether piecewise constant, piecewise polynomial, variable knot spline, or variable bandwidth kernel, … SUMMARY With ideal spatial adaptation, an oracle furnishes information about how best to adapt a spatially variable estimator, whether piecewise constant, piecewise polynomial, variable knot spline, or variable bandwidth kernel, to the unknown function. Estimation with the aid of an oracle offers dramatic advantages over traditional linear estimation by nonadaptive kernels; however, it is a priori unclear whether such performance can be obtained by a procedure relying on the data alone. We describe a new principle for spatially-adaptive estimation: selective wavelet reconstruction. We show that variable-knot spline fits and piecewise-polynomial fits, when equipped with an oracle to select the knots, are not dramatically more powerful than selective wavelet reconstruction with an oracle. We develop a practical spatially adaptive method, RiskShrink, which works by shrinkage of empirical wavelet coefficients. RiskShrink mimics the performance of an oracle for selective wavelet reconstruction as well as it is possible to do so. A new inequality in multivariate normal decision theory which we call the oracle inequality shows that attained performance differs from ideal performance by at most a factor of approximately 2 log n, where n is the sample size. Moreover no estimator can give a better guarantee than this. Within the class of spatially adaptive procedures, RiskShrink is essentially optimal. Relying only on the data, it comes within a factor log 2 n of the performance of piecewise polynomial and variableknot spline methods equipped with an oracle. In contrast, it is unknown how or if piecewise polynomial methods could be made to function this well when denied access to an oracle and forced to rely on data alone.
Chemometrics is a field of chemistry that studies the application of statistical methods to chemical data analysis. In addition to borrowing many techniques from the statistics and engineering literatures, chemometrics … Chemometrics is a field of chemistry that studies the application of statistical methods to chemical data analysis. In addition to borrowing many techniques from the statistics and engineering literatures, chemometrics itself has given rise to several new data-analytical methods. This article examines two methods commonly used in chemometrics for predictive modeling—partial least squares and principal components regression—from a statistical perspective. The goal is to try to understand their apparent successes and in what situations they can be expected to work well and to compare them with other statistical methods intended for those situations. These methods include ordinary least squares, variable subset selection, and ridge regression.
In multiple regression it is shown that parameter estimates based on minimum residual sum of squares have a high probability of being unsatisfactory, if not incorrect, if the prediction vectors … In multiple regression it is shown that parameter estimates based on minimum residual sum of squares have a high probability of being unsatisfactory, if not incorrect, if the prediction vectors are not orthogonal. Proposed is an estimation procedure based on adding small positive quantities to the diagonal of X′X. Introduced is the ridge trace, a method for showing in two dimensions the effects of nonorthogonality. It is then shown how to augment X′X to obtain biased estimates with smaller mean square error.
This paper proposes the least absolute shrinkage and selection operator–type (Lasso-type) generalized method of moments (GMM) estimator. This Lasso-type estimator is formed by the GMM objective function with the addition … This paper proposes the least absolute shrinkage and selection operator–type (Lasso-type) generalized method of moments (GMM) estimator. This Lasso-type estimator is formed by the GMM objective function with the addition of a penalty term. The exponent of the penalty term in the regular Lasso estimator is equal to one. However, the exponent of the penalty term in the Lasso-type estimator is less than one in the analysis here. The magnitude of the exponent is reduced to avoid the asymptotic bias. This estimator selects the correct model and estimates it simultaneously. In other words, this method estimates the redundant parameters as zero in the large samples and provides the standard GMM limit distribution for the estimates of the nonzero parameters in the model. The asymptotic theory for our estimator is nonstandard. We conduct a simulation study that shows that the Lasso-type GMM correctly selects the true model much more often than the Bayesian information Criterion (BIC) and another model selection procedure based on the GMM objective function.
Let $X_1, X_2, \cdots$ be i.i.d. random variables whose common distribution function $F$ is in the domain of attraction of a nonnormal stable distribution. A simple, probabilistic proof of the … Let $X_1, X_2, \cdots$ be i.i.d. random variables whose common distribution function $F$ is in the domain of attraction of a nonnormal stable distribution. A simple, probabilistic proof of the convergence of the normalized partial sums to the stable distribution is given. The proof makes use of an elementary property of order statistics and clarifies the manner in which the largest few summands determine the limiting distribution. The method is applied to determine the limiting distribution of self-norming sums and deduce a representation for the limiting distribution. The representation affords an explanation of the infinite discontinuities of the limiting densities which occur in some cases. Application of the technique to prove weak convergence in a separable Hilbert space is explored.
Abstract. Let Y n =μ+Σβ j ( Y n–j –μ) +ɛ n be a p th order autoregressive process with innovations {ɛ n } in the domain of attraction of … Abstract. Let Y n =μ+Σβ j ( Y n–j –μ) +ɛ n be a p th order autoregressive process with innovations {ɛ n } in the domain of attraction of a stable law with index α &lt; 2. Hannan and Kanter (1977) showed that when the location parameter μ is known that least squares estimates of the autoregressive parameters have a very fast rate of convergence; specifically, N 1/Ī“ (β j –β j ) → O almost surely for Ī“ &gt; α. It is shown here that if α is estimated by the sample mean, N 1/Ī“ (β j –β j ) → O almost surely for Ī“ &gt; max(1, α). In addition, some statements are made regarding estimators of α which will give the full (Hannan and Kanter) rate of convergence, in particular when α &lt; 1.
Abstract Edgeworth and bootstrap approximations to estimator distributions in L1 regression are described. Analytic approximations based on Edgeworth expansions that mix lattice and nonlattice components and allow for an intercept … Abstract Edgeworth and bootstrap approximations to estimator distributions in L1 regression are described. Analytic approximations based on Edgeworth expansions that mix lattice and nonlattice components and allow for an intercept term in the regression are developed under mild conditions, which do not even require a density for the error distribution. Under stronger assumptions on the error distribution, the Edgeworth expansion assumes a simpler form. Bootstrap approximations are described, and the consistency of the bootstrap in the L 1 regression setting is established. We show how the slow rate n āˆ’1/4 of convergence in this context of the standard, unsmoothed bootstrap that resamples for the raw residuals may be improved to rate n āˆ’2/5 by two methods: a smoothed bootstrap approach based on resampling from an appropriate kernel estimator of the error density and a normal approximation that uses a kernel estimator of the error density at a particular point, its median 0. Both of these methods require choice of a smoothing bandwidth, however. Numerical illustrations of the comparative performances of the different estimators in small samples are given, and simple but effective empirical rules for choice of smoothing bandwidth are suggested.
The lasso is a popular technique for simultaneous estimation and variable selection. Lasso variable selection has been shown to be consistent under certain conditions. In this work we derive a … The lasso is a popular technique for simultaneous estimation and variable selection. Lasso variable selection has been shown to be consistent under certain conditions. In this work we derive a necessary condition for the lasso variable selection to be consistent. Consequently, there exist certain scenarios where the lasso is inconsistent for variable selection. We then propose a new version of the lasso, called the adaptive lasso, where adaptive weights are used for penalizing different coefficients in the ā„“1 penalty. We show that the adaptive lasso enjoys the oracle properties; namely, it performs as well as if the true underlying model were given in advance. Similar to the lasso, the adaptive lasso is shown to be near-minimax optimal. Furthermore, the adaptive lasso can be solved by the same efficient algorithm for solving the lasso. We also discuss the extension of the adaptive lasso in generalized linear models and show that the oracle properties still hold under mild regularity conditions. As a byproduct of our theory, the nonnegative garotte is shown to be consistent for variable selection.
We discuss the following problem: given a random sample $\mathbf{X} = (X_1, X_2, \cdots, X_n)$ from an unknown probability distribution $F$, estimate the sampling distribution of some prespecified random variable … We discuss the following problem: given a random sample $\mathbf{X} = (X_1, X_2, \cdots, X_n)$ from an unknown probability distribution $F$, estimate the sampling distribution of some prespecified random variable $R(\mathbf{X}, F)$, on the basis of the observed data $\mathbf{x}$. (Standard jackknife theory gives an approximate mean and variance in the case $R(\mathbf{X}, F) = \theta(\hat{F}) - \theta(F), \theta$ some parameter of interest.) A general method, called the "bootstrap," is introduced, and shown to work satisfactorily on a variety of estimation problems. The jackknife is shown to be a linear approximation method for the bootstrap. The exposition proceeds by a series of examples: variance of the sample median, error rates in a linear discriminant analysis, ratio estimation, estimating regression parameters, etc.
We show that regression quantiles, which could be computed as solutions of a linear programming problem, and the solutions of the corresponding dual problem, which we call the regression rank-scores, … We show that regression quantiles, which could be computed as solutions of a linear programming problem, and the solutions of the corresponding dual problem, which we call the regression rank-scores, generalize the duality of order statistics and of ranks from the location to the linear model. Noting this fact, we study the regression quantile and regression rank-score processes in the heteroscedastic linear regression model, obtaining some new estimators and interesting comparisons with existing estimators.
Previous article Next article Local Limit Theorems for Lattice Random VariablesA. B. MukhinA. B. Mukhinhttps://doi.org/10.1137/1136086PDFBibTexSections ToolsAdd to favoritesExport CitationTrack CitationsEmail SectionsAbout[1] Yu. V. Prokhorov, On a local limit theorem for … Previous article Next article Local Limit Theorems for Lattice Random VariablesA. B. MukhinA. B. Mukhinhttps://doi.org/10.1137/1136086PDFBibTexSections ToolsAdd to favoritesExport CitationTrack CitationsEmail SectionsAbout[1] Yu. V. Prokhorov, On a local limit theorem for lattice distributions, Dokl. Akad. Nauk SSSR (N.S.), 98 (1954), 535–538, (In Russian.) 16,494c Google Scholar[2] Yu. A. Rozanov, On a local limit theorem for lattice distributions, Theory Probab. Appl., 2 (1957), 260–265 10.1137/1102018 LinkGoogle Scholar[3] N. G. Gamkrelidze, On a local limit theorem for lattice random variables, Theory Probab. Appl., 9 (1964), 662–664 10.1137/1109091 0141.16406 LinkGoogle Scholar[4] A. A. Mitalauskas and , V. A. Statulyavichus, On local limit theorems. I, II, Lithuanian Math. J., 14 (1974), 628–640, 17 (1977), pp. 550–554 0348.60027 CrossrefGoogle Scholar[5] I. Dubinskaite, Local limit theorems for sums of nonidentically distributed m-lattice random vectors, Litovsk. Mat. Sb., 21 (1981), 97–116, (In Russian.) 83e:60018 0494.60025 Google Scholar[6] A. A. Mitalauskas, A local limit theorem in the case of convergence to a law of class L, Litovsk. Mat. Sb., 26 (1986), 723–728, (In Russian.) 88c:60059 Google Scholar[7] A. B. Mukhin, Local limit theorems for an arbitrary law, I; II; III, Izv. Akad. Nauk Uzbek. SSR, Ser Fiz.-Mat., 1 (1977), 24–27, 4 (1977), pp. 17–22; 3 (1978), pp. 30–36. (In Russian.) Google Scholar[8] An'Zung Vo, , A. B. Mukhin and , An'Zung To, Certain local limit theorems for independent integer-valued random variables, Izv. Akad. Nauk UzSSR Ser. Fiz.-Mat. Nauk, (1980), 9–15, 99, (In Russian.) 82d:60046 0494.60028 Google Scholar[9] A. S. Atamatov, Masters Thesis, Multi-dimensional local limit theorems for lattice distributions, dissertation, Inst. Math. AN Uzbek. SSR, Tashkent, 1987, (In Russian.) Google Scholar[10] Yu. V. Larin and , A. B. Mukhin, Local limit theorems for arbitrary distributionsAsymptotic methods in mathematical statistics (Russian), "Fan", Tashkent, 1987, 47–62, 159, (In Russian.) 89a:60065 Google Scholar[11] Yu. V. Larin, Local limit theorems for m-lattice random vectors, Izv. Akad. Nauk UzSSR Ser. Fiz.-Mat. Nauk, (1988), 22–27, 89, (In Russian.) 89m:60054 0671.60019 Google Scholar[12] Valentin F. Kolchin, Random mappings, Translation Series in Mathematics and Engineering, Optimization Software Inc. Publications Division, New York, 1986xiv + 207 88a:60022 0605.60010 Google Scholar[13] A. B. Mukhin, Some necessary and sufficient conditions for the validity of local limit theorems, Dokl. Akad. Nauk UzSSR, (1984), 7–8, (In Russian.) 86h:60057 0553.60034 Google Scholar[14] An'Zung To, Smoothing of distributions in summation and local limit theorems, Izv. Akad. Nauk UzSSR Ser. Fiz.-Mat. Nauk, (1986), 44–51, 98, (In Russian.) 88b:60072 0625.60021 Google Scholar[15] N. G. Gamkrelidze, On the application of a smoothness function in proving a local limit theorem, Theory Probab. Appl., 33 (1988), 352–355 10.1137/1133053 0666.60025 LinkGoogle Scholar[16] V. A. Statulyavichus, Limit theorems for densities and asymptotic expansions for distributions of sums of independent random variables, Theory Probab. Appl., 10 (1965), 582–595 10.1137/1110074 LinkGoogle Scholar[17] A. B. Mukhin, Local limit theorems for lattice random variables, Dokl. Akad. Nauk UzSSR, (1988), 7–10 (1989), (In Russian.) 90m:60035 Google Scholar[18] A. B. Mukhin, Smoothing of distributions of sums of independent random variables, 1988, manuscript VINITI, 2861–V88, (In Russian.) Google Scholar[19] A. B. Mukhin, Local limit theorems for sums of independent random variables, 1989, manuscript VINITI 2404–V89, (In Russian.) Google Scholar[20] A. B. Mukhin, Local limit theorems for distributions of sums of independent random vectors, Theory Probab Appl., 29 (1984), 369–375 0557.60019 LinkGoogle Scholar[21] N. G. Gamkrelidze, A measure of "smoothness" of multidimensional distributions of integer-valued random vectors, Theory Probab Appl., 30 (1985), 427–431 0585.60034 LinkGoogle Scholar[22] Herbert Heyer, Probability measures on locally compact groups, Springer-Verlag, Berlin, 1977x+531, New York 58:18648 0376.60002 CrossrefGoogle Scholar Previous article Next article FiguresRelatedReferencesCited byDetails The critical mean-field Chayes–Machta dynamics11 May 2022 | Combinatorics, Probability and Computing, Vol. 31, No. 6 Cross Ref Edgeworth expansions for independent bounded integer valued random variablesStochastic Processes and their Applications, Vol. 152 Cross Ref On Some Aspects of Evolution of Generalized Allocation Schemes15 October 2022 | Journal of Mathematical Sciences, Vol. 267, No. 2 Cross Ref ŠžŠ±ŃŠŠµŠ¼Ń‹ Š“ŠµŃ€ŠµŠ²ŃŒŠµŠ² ŃŠ»ŃƒŃ‡Š°Š¹Š½Š¾Š³Š¾ леса Šø ŠŗŠ¾Š½Ń„ŠøŠ³ŃƒŃ€Š°Ń†ŠøŠ¾Š½Š½Ń‹Šµ графы19 April 2022 | Š¢Ń€ŃƒŠ“Ń‹ ŠœŠ°Ń‚ŠµŠ¼Š°Ń‚ŠøŃ‡ŠµŃŠŗŠ¾Š³Š¾ ŠøŠ½ŃŃ‚ŠøŃ‚ŃƒŃ‚Š° имени Š’. А. Дтеклова, Vol. 316 Cross Ref Sizes of Trees in a Random Forest and Configuration Graphs27 April 2022 | Proceedings of the Steklov Institute of Mathematics, Vol. 316, No. 1 Cross Ref The maximum tree of a random forest in the configuration graphSbornik: Mathematics, Vol. 212, No. 9 Cross Ref On the local limit theorems for psi-mixing Markov chainsLatin American Journal of Probability and Mathematical Statistics, Vol. 18, No. 1 Cross Ref On Integro-Local CLT for Sums of Independent Random VectorsL. V. Rozovsky13 February 2020 | Theory of Probability & Its Applications, Vol. 64, No. 4AbstractPDF (260 KB)ŠžŠ± интегро-локальной ЦПТ Š“Š»Ń сумм независимых ŃŠ»ŃƒŃ‡Š°Š¹Š½Ń‹Ń… векторов22 October 2019 | Š¢ŠµŠ¾Ń€ŠøŃ Š²ŠµŃ€Š¾ŃŃ‚Š½Š¾ŃŃ‚ŠµŠ¹ Šø ее ŠæŃ€ŠøŠ¼ŠµŠ½ŠµŠ½ŠøŃ, Vol. 64, No. 4 Cross Ref Une analyse de la loi Ć©lectorale du 29 juin 1820Revue Ć©conomique, Vol. 65, No. 3 Cross Ref NONSTANDARD QUANTILE-REGRESSION INFERENCE1 October 2009 | Econometric Theory, Vol. 25, No. 5 Cross Ref Joint distribution of the number of vertices with given different outdegrees in Galton–Watson forestMathematics and Computers in Simulation, Vol. 79, No. 8 Cross Ref On the limit distributions of the vertex degrees of conditional Internet graphsDiscrete Mathematics and Applications, Vol. 19, No. 4 Cross Ref Asymptotics of the regression quantile basic solution under misspecification3 June 2008 | Applications of Mathematics, Vol. 53, No. 3 Cross Ref Random graphs of Internet type and the generalised allocation schemeDiscrete Mathematics and Applications, Vol. 18, No. 5 Cross Ref Limit distributions of the number of vertices of a given out-degree in a random forestJournal of Mathematical Sciences, Vol. 138, No. 1 Cross Ref Nonstandard Quantile-Regression InferenceSSRN Electronic Journal Cross Ref Limit distributions for the number of leaves in a random forest1 July 2016 | Advances in Applied Probability, Vol. 34, No. 4 Cross Ref Approximation of Local Probabilities of Sums of Independent Random VariablesA. B. Mukhin17 February 2012 | Theory of Probability & Its Applications, Vol. 42, No. 4AbstractPDF (293 KB)Relationship between Local and Integral Limit TheoremsA. B. Mukhin28 July 2006 | Theory of Probability & Its Applications, Vol. 40, No. 1AbstractPDF (1244 KB) Volume 36, Issue 4| 1992Theory of Probability & Its Applications History Submitted:12 October 1988Published online:17 July 2006 InformationCopyright Ā© Society for Industrial and Applied MathematicsPDF Download Article & Publication DataArticle DOI:10.1137/1136086Article page range:pp. 698-713ISSN (print):0040-585XISSN (online):1095-7219Publisher:Society for Industrial and Applied Mathematics
We establish a new functional central limit theorem for empirical processes indexed by classes of functions. In a neighborhood of a fixed parameter point, an $n^{-1/3}$ rescaling of the parameter … We establish a new functional central limit theorem for empirical processes indexed by classes of functions. In a neighborhood of a fixed parameter point, an $n^{-1/3}$ rescaling of the parameter is compensated for by an $n^{2/3}$ rescaling of the empirical measure, resulting in a limiting Gaussian process. By means of a modified continuous mapping theorem for the location of the maximizing value, we deduce limit theorems for several statistics defined by maximization or constrained minimization of a process derived from the empirical measure. These statistics include the short, Rousseeuw's least median of squares estimator, Manski's maximum score estimator, and the maximum likelihood estimator for a monotone density. The limit theory depends on a simple new sufficient condition for a Gaussian process to achieve its maximum almost surely at a unique point.
The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data … The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to select a parsimonious set for the efficient prediction of a response variable. Least Angle Regression (LARS), a new model selection algorithm, is a useful and less greedy version of traditional forward selection methods. Three main properties are derived: (1) A simple modification of the LARS algorithm implements the Lasso, an attractive version of ordinary least squares that constrains the sum of the absolute regression coefficients; the LARS modification calculates all possible Lasso estimates for a given problem, using an order of magnitude less computer time than previous methods. (2) A different LARS modification efficiently implements Forward Stagewise linear regression, another promising new model selection method; this connection explains the similar numerical results previously observed for the Lasso and Stagewise, and helps us understand the properties of both methods, which are seen as constrained versions of the simpler LARS algorithm. (3) A simple approximation for the degrees of freedom of a LARS estimate is available, from which we derive a Cp estimate of prediction error; this allows a principled choice among the range of possible LARS estimates. LARS and its variants are computationally efficient: the paper describes a publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates.
Variable selection is fundamental to high-dimensional statistical modeling, including nonparametric regression. Many approaches in use are stepwise selection procedures, which can be computationally expensive and ignore stochastic errors in the … Variable selection is fundamental to high-dimensional statistical modeling, including nonparametric regression. Many approaches in use are stepwise selection procedures, which can be computationally expensive and ignore stochastic errors in the variable selection process. In this article, penalized likelihood approaches are proposed to handle these kinds of problems. The proposed methods select variables and estimate coefficients simultaneously. Hence they enable us to construct confidence intervals for estimated parameters. The proposed approaches are distinguished from others in that the penalty functions are symmetric, nonconcave on (0, āˆž), and have singularities at the origin to produce sparse solutions. Furthermore, the penalty functions should be bounded by a constant to reduce bias and satisfy certain conditions to yield continuous solutions. A new algorithm is proposed for optimizing penalized likelihood functions. The proposed ideas are widely applicable. They are readily applied to a variety of parametric models such as generalized linear models and robust regression models. They can also be applied easily to nonparametric modeling by using wavelets and splines. Rates of convergence of the proposed penalized likelihood estimators are established. Furthermore, with proper choice of regularization parameters, we show that the proposed estimators perform as well as the oracle procedure in variable selection; namely, they work as well as if the correct submodel were known. Our simulation shows that the newly proposed methods compare favorably with other variable selection techniques. Furthermore, the standard error formulas are tested to be accurate enough for practical applications.
In multiple regression it is shown that parameter estimates based on minimum residual sum of squares have a high probability of being unsatisfactory, if not incorrect, if the prediction vectors … In multiple regression it is shown that parameter estimates based on minimum residual sum of squares have a high probability of being unsatisfactory, if not incorrect, if the prediction vectors are not orthogonal. Proposed is an estimation procedure based on adding small positive quantities to the diagonal of X′X. Introduced is the ridge trace, a method for showing in two dimensions the effects of nonorthogonality. It is then shown how to augment X′X to obtain biased estimates with smaller mean square error.
Weak Convergence in Metric Spaces. The Space C. The Space D. Dependent Variables. Other Modes of Convergence. Appendix. Some Notes on the Problems. Bibliographical Notes. Bibliography. Index. Weak Convergence in Metric Spaces. The Space C. The Space D. Dependent Variables. Other Modes of Convergence. Appendix. Some Notes on the Problems. Bibliographical Notes. Bibliography. Index.
In this paper Chow and Robbins' (1965) sequential theory has been extended to construct a confidence region with prescribed maximum width and prescribed coverage probability for the linear regression parameters … In this paper Chow and Robbins' (1965) sequential theory has been extended to construct a confidence region with prescribed maximum width and prescribed coverage probability for the linear regression parameters under weaker conditions than Srivastava (1967), Albert (1966), and Gleser (1965). An extension to multivariate case has also been carried out.
Summary In the paper I give a brief review of the basic idea and some history and then discuss some developments since the original paper on regression shrinkage and selection … Summary In the paper I give a brief review of the basic idea and some history and then discuss some developments since the original paper on regression shrinkage and selection via the lasso.
Abstract Proposed by Tibshirani, the least absolute shrinkage and selection operator (LASSO) estimates a vector of regression coefficients by minimizing the residual sum of squares subject to a constraint on … Abstract Proposed by Tibshirani, the least absolute shrinkage and selection operator (LASSO) estimates a vector of regression coefficients by minimizing the residual sum of squares subject to a constraint on the l 1-norm of the coefficient vector. The LASSO estimator typically has one or more zero elements and thus shares characteristics of both shrinkage estimation and variable selection. In this article we treat the LASSO as a convex programming problem and derive its dual. Consideration of the primal and dual problems together leads to important new insights into the characteristics of the LASSO estimator and to an improved method for estimating its covariance matrix. Using these results we also develop an efficient algorithm for computing LASSO estimates which is usable even in cases where the number of regressors exceeds the number of observations. An S-Plus library based on this algorithm is available from StatLib.
Abstract In this article, we consider bootstrapping the Lasso estimator of the regression parameter in a multiple linear regression model. It is known that the standard bootstrap method fails to … Abstract In this article, we consider bootstrapping the Lasso estimator of the regression parameter in a multiple linear regression model. It is known that the standard bootstrap method fails to be consistent. Here, we propose a modified bootstrap method, and show that it provides valid approximation to the distribution of the Lasso estimator, for all possible values of the unknown regression parameter vector, including the case where some of the components are zero. Further, we establish consistency of the modified bootstrap method for estimating the asymptotic bias and variance of the Lasso estimator. We also show that the residual bootstrap can be used to consistently estimate the distribution and variance of the adaptive Lasso estimator. Using the former result, we formulate a novel data-based method for choosing the optimal penalizing parameter for the Lasso using the modified bootstrap. A numerical study is performed to investigate the finite sample performance of the modified bootstrap. The methodology proposed in the article is illustrated with a real data example. Keywords: : Bootstrap variance estimationPenalized regressionRegularizationShrinkage
SUMMARY The paper addresses the evergreen problem of construction of regressors for use in least squares multiple regression. In the context of a general sequential procedure for doing this, it … SUMMARY The paper addresses the evergreen problem of construction of regressors for use in least squares multiple regression. In the context of a general sequential procedure for doing this, it is shown that, with a particular objective criterion for the construction, the procedures of ordinary least squares and principal components regression occupy the opposite ends of a continuous spectrum, with partial least squares lying in between. There are two adjustable ā€˜parameters’ controlling the procedure: ā€˜alpha’, in the continuum [0, 1], and ā€˜omega’, the number of regressors finally accepted. These control parameters are chosen by cross-validation. The method is illustrated by a range of examples of its application.
Abstract Bridge regression, a special family of penalized regressions of a penalty function Ī£|βj|γ with γ ≤ 1, considered. A general approach to solve for the bridge estimator is developed. … Abstract Bridge regression, a special family of penalized regressions of a penalty function Ī£|βj|γ with γ ≤ 1, considered. A general approach to solve for the bridge estimator is developed. A new algorithm for the lasso (γ = 1) is obtained by studying the structure of the bridge estimators. The shrinkage parameter γ and the tuning parameter Ī» are selected via generalized cross-validation (GCV). Comparison between the bridge model (γ ≤ 1) and several other shrinkage models, namely the ordinary least squares regression (Ī» = 0), the lasso (γ = 1) and ridge regression (γ = 2), is made through a simulation study. It is shown that the bridge regression performs well compared to the lasso and ridge regression. These methods are demonstrated through an analysis of a prostate cancer data. Some computational advantages and limitations are discussed.
A class of variable selection procedures for parametric models via nonconcave penalized likelihood was proposed in Fan and Li (2001a). It has been shown there that the resulting procedures perform … A class of variable selection procedures for parametric models via nonconcave penalized likelihood was proposed in Fan and Li (2001a). It has been shown there that the resulting procedures perform as well as if the subset of significant variables were known in advance. Such a property is called an oracle property. The proposed procedures were illustrated in the context of linear regression, robust linear regression and generalized linear models. In this paper, the nonconcave penalized likelihood approach is extended further to the Cox proportional hazards model and the Cox proportional hazards frailty model, two commonly used semi-parametric models in survival analysis. As a result, new variable selection procedures for these two commonly-used models are proposed. It is demonstrated how the rates of convergence depend on the regularization parameter in the penalty function. Further, with a proper choice of the regularization parameter and the penalty function, the proposed estimators possess an oracle property. Standard error formulae are derived and their accuracies are empirically tested. Simulation studies show that the proposed procedures are more stable in prediction and more effective in computation than the best subset variable selection, and they reduce model complexity as effectively as the best subset variable selection. Compared with the LASSO, which is the penalized likelihood method with the $L_1$ -penalty, proposed by Tibshirani, the newly proposed approaches have better theoretic properties and finite sample performance.
This chapter reviews the theory of extremal quantile regression. It shows that each of the sequences produces different asymptotic approximation to the distribution of the quantile regression estimators. The chapter … This chapter reviews the theory of extremal quantile regression. It shows that each of the sequences produces different asymptotic approximation to the distribution of the quantile regression estimators. The chapter also reviews models for marginal and conditional extreme quantiles. It describes estimation and inference methods for extreme quantile models. It presents two empirical applications of extremal quantile regression to conditional VaR and financial contagion. The chapter provides typical modeling assumptions in extremal quantile regression. It discusses the estimation and inference methods for extremal quantile regression. Very low birthweights are connected with subsequent health problems and therefore extremal quantile regression can help identify factors to improve adult health outcomes. Zhang employed extremal quantile regression methods to estimate tail quantile treatment effects under a selection on observables assumption. The work of Portnoy and Koenker and Gutenbrunner et al. implicitly contained some results on extending the normal approximations to intermediate order regression quantiles in location models.
The least-absolute-deviations (LAD) estimator for a median-regression model does not satisfy the standard conditions for obtaining asymptotic refinements through use of the bootstrap because the LAD objective function is not … The least-absolute-deviations (LAD) estimator for a median-regression model does not satisfy the standard conditions for obtaining asymptotic refinements through use of the bootstrap because the LAD objective function is not smooth. This paper overcomes this problem by smoothing the objective function. The smoothed estimator is asymptotically equivalent to the standard LAD estimator. With bootstrap critical values, the rejection probabilities of symmetrical t and X 2 tests based on the smoothed estimator are correct through O(n -γ ) under the null hypothesis, where γ<1 but can be arbitrarily close to 1. In contrast, first-order asymptotic approximations make errors of size O(n -γ ). These results also hold for symmetrical t and X 2 tests for censored median regression models.
Let $\{Z_k, -\infty < k < \infty\}$ be iid where the $Z_k$'s have regularly varying tail probabilities. Under mild conditions on a real sequence $\{c_j, j \geq 0\}$ the stationary … Let $\{Z_k, -\infty < k < \infty\}$ be iid where the $Z_k$'s have regularly varying tail probabilities. Under mild conditions on a real sequence $\{c_j, j \geq 0\}$ the stationary process $\{X_n: = \sum^\infty_{j=0} c_jZ_{n-j}, n \geq 1\}$ exists. A point process based on $\{X_n\}$ converges weakly and from this, a host of weak limit results for functionals of $\{X_n\}$ ensue. We study sums, extremes, excedences and first passages as well as behavior of sample covariance functions.
By means of two simple convexity arguments we are able to develop a general method for proving consistency and asymptotic normality of estimators that are defined by minimisation of convex … By means of two simple convexity arguments we are able to develop a general method for proving consistency and asymptotic normality of estimators that are defined by minimisation of convex criterion functions. This method is then applied to a fair range of different statistical estimation problems, including Cox regression, logistic and Poisson regression, least absolute deviation regression outside model conditions, and pseudo-likelihood estimation for Markov chains. Our paper has two aims. The first is to exposit the method itself, which in many cases, under reasonable regularity conditions, leads to new proofs that are simpler than the traditional proofs. Our second aim is to exploit the method to its limits for logistic regression and Cox regression, where we seek asymptotic results under as weak regularity conditions as possible. For Cox regression in particular we are able to weaken previously published regularity conditions substantially.
Let $X_t = \sum^\infty_{j=-\infty} c_jZ_{t-j}$ be a moving average process where the $Z_t$'s are iid and have regularly varying tail probabilities with index $\alpha > 0$. The limit distribution of … Let $X_t = \sum^\infty_{j=-\infty} c_jZ_{t-j}$ be a moving average process where the $Z_t$'s are iid and have regularly varying tail probabilities with index $\alpha > 0$. The limit distribution of the sample covariance function is derived in the case that the process has a finite variance but an infinite fourth moment. Furthermore, in the infinite variance case $(0 < \alpha < 2)$, the sample correlation function is shown to converge in distribution to the ratio of two independent stable random variables with indices $\alpha$ and $\alpha/2$, respectively. This result immediately gives the limit distribution for the least squares estimates of the parameters in an autoregressive process.
In the general regression model $y_i = x'_i \beta + e_i $, for $i = 1, \cdots ,n$ and $\beta \in {\bf R}^p $, the "regression quantile" $\hat{\beta } (\theta … In the general regression model $y_i = x'_i \beta + e_i $, for $i = 1, \cdots ,n$ and $\beta \in {\bf R}^p $, the "regression quantile" $\hat{\beta } (\theta )$ estimates the coefficients of the linear regression function parallel to $x'_i \beta $ and roughly lying above a fraction $\theta $ of the data. As introduced by Koenker and Bassett [Econometrica, 46 (1978), pp. 33–50], these regression quantiles are analogous to order statistics and provide a natural and appealing approach to the analysis of the general linear model. Computation of $\hat{\beta } (\theta )$ can be expressed as a parametric linear programming problem with $J_n $ distinct extremal solutions as $\theta $ goes from zero to one. That is, there will be $J_n $ breakpoints $\{ \theta _J \} $, for $j = 1, \cdots ,J_n $, such that $\hat{\beta } (\theta _j )$ is obtained from $\hat{\beta } (\theta _{j - 1} )$ by a single simplex pivot. Each $\hat{\beta } (\theta _j )$ is characterized by a specific subset of p observations. Although no previous result restricts $J_n $to be less than the upper bound $\begin{pmatrix} n \\ p \end{pmatrix} = {\bf O}(n^p )$, practical experience suggests that $J_n $ grows roughly linearly with n. Here it is shown that, in fact, $J_n = {\bf O}(n\log n)$ in probability, where the distributional assumptions are those typical of multiple regression situations. The result is based on a probabilistic rather than combinatoric approach which should have general application to the probabilistic behavior of the number of pivots in (parametric) linear programming problems. The conditions are roughly that the constraint coefficients form random independent vectors, and that the number of variables is fixed while the number of constraints tends to infinity.
This Monte Carlo study examines several estimation procedures of the asymptotic covariance matrix in the quantile and censored quantile regression models: design matrix bootstrap, error bootstrapping, order statistic, sigma bootstrap, … This Monte Carlo study examines several estimation procedures of the asymptotic covariance matrix in the quantile and censored quantile regression models: design matrix bootstrap, error bootstrapping, order statistic, sigma bootstrap, homoskedastic kernel, and heteroskedastic kernel. The Monte Carlo samples are drawn from two alternative data sets: (a) the unaltered Current Population Survey (CPS) for 1987 and (b) this CPS data with independence between error term and regressors imposed.
Partial least squares (PLS) modeling is an algorithm for relating one or more dependent variables to two or more independent variables. As a regression procedure it apparently evolved from the … Partial least squares (PLS) modeling is an algorithm for relating one or more dependent variables to two or more independent variables. As a regression procedure it apparently evolved from the method of principal components regression (PCR) using the NIPALS algorithm, which is similar to the power method for determining the eigenvectors and eigenvalues of a matrix. This paper presents a theoretical explanation of the PLS algorithm using singular value decomposition and the power method. The relation of PLS to PCR is demonstrated, and PLS is shown to be one of a continuum of possible solutions of a similar type. These other solutions may give better prediction than either PLS or PCR under appropriate conditions.
SUMMARY The asymptotic variance of a sample quantile depends on the value of the population density at the population quantile. Therefore Studentizing a sample quantile involves density estimation, either explicitly … SUMMARY The asymptotic variance of a sample quantile depends on the value of the population density at the population quantile. Therefore Studentizing a sample quantile involves density estimation, either explicitly or implicitly. One popular way of Studentizing is to use the Siddiqui–Bloch–Gastwirth estimator, whose construction depends crucially on the choice of a smoothing parameter m. We examine the effect which the selection of m has on the level error of tests or confidence intervals based on Studentized quantiles and show that, if we wish to minimize this error, m should be of a smaller order of magnitude than is recommended by squared error theory. The cases of one- and two-sided procedures are distinctly different, the former being less sensitive to the choice of m.
Efron's "bootstrap" method of distribution approximation is shown to be asymptotically valid in a large number of situations, including $t$-statistics, the empirical and quantile processes, and von Mises functionals. Some … Efron's "bootstrap" method of distribution approximation is shown to be asymptotically valid in a large number of situations, including $t$-statistics, the empirical and quantile processes, and von Mises functionals. Some counter-examples are also given, to show that the approximation does not always succeed.
Necessary and sufficient conditions for the weak consistency of the sample median Necessary and sufficient conditions for the weak consistency of the sample median