The Dantzig selector: Statistical estimation when p is much larger than n

Type: Article

Publication Date: 2007-12-01

Citations: 2388

DOI: https://doi.org/10.1214/009053606000001523

Abstract

In many important statistical applications, the number of variables or parameters p is much larger than the number of observations n. Suppose then that we have observations y=Xβ+z, where β∈Rp is a parameter vector of interest, X is a data matrix with possibly far fewer rows than columns, n≪p, and the zi’s are i.i.d. N(0, σ2). Is it possible to estimate β reliably based on the noisy data y? To estimate β, we introduce a new estimator—we call it the Dantzig selector—which is a solution to the ℓ1-regularization problem $$\min_{\tilde{\beta}\in\mathbf{R}^{p}}\|\tilde{\beta}\|_{\ell_{1}}\quad\mbox{subject to}\quad \|X^{*}r\|_{\ell_{\infty}}\leq(1+t^{-1})\sqrt{2\log p}\cdot\sigma,$$ where r is the residual vector y−Xβ̃ and t is a positive scalar. We show that if X obeys a uniform uncertainty principle (with unit-normed columns) and if the true parameter vector β is sufficiently sparse (which here roughly guarantees that the model is identifiable), then with very large probability, ‖β̂−β‖ℓ22≤C2⋅2log p⋅(σ2+∑imin(βi2, σ2)). Our results are nonasymptotic and we give values for the constant C. Even though n may be much smaller than p, our estimator achieves a loss within a logarithmic factor of the ideal mean squared error one would achieve with an oracle which would supply perfect information about which coordinates are nonzero, and which were above the noise level. In multivariate regression and from a model selection viewpoint, our result says that it is possible nearly to select the best subset of variables by solving a very simple convex program, which, in fact, can easily be recast as a convenient linear program (LP).

Locations

  • The Annals of Statistics - View - PDF
  • arXiv (Cornell University) - View - PDF
  • CaltechAUTHORS (California Institute of Technology) - View - PDF
  • arXiv (Cornell University) - PDF
  • DataCite API - View
  • The Annals of Statistics - View - PDF
  • arXiv (Cornell University) - View - PDF
  • CaltechAUTHORS (California Institute of Technology) - View - PDF
  • arXiv (Cornell University) - PDF
  • DataCite API - View

Similar Works

Action Title Year Authors
+ PDF Discussion: The Dantzig selector: Statistical estimation when p is much larger than n 2007 Tommaso Cai
Jinchi Lv
+ Near-ideal model selection by ℓ1 minimization 2009 Emmanuel J. Candès
Yaniv Plan
+ Thresholding Procedures for High Dimensional Variable Selection and Statistical Estimation 2009 Shuheng Zhou
+ Sparse recovery under matrix uncertainty 2010 Mathieu Rosenbaum
Alexandre B. Tsybakov
+ PDF Randomized pick-freeze for sparse Sobol indices estimation in high dimension 2015 Yohann de Castro
Alexandre Janon
+ De-biasing the Lasso: Optimal Sample Size for Gaussian Designs 2015 Adel Javanmard
Andrea Montanari
+ De-biasing the Lasso: Optimal Sample Size for Gaussian Designs 2015 Adel Javanmard
Andrea Montanari
+ High-dimensional stochastic optimization with the generalized Dantzig estimator 2008 Karim Lounici
+ Improved Matrix Uncertainty Selector 2011 Mathieu Rosenbaum
Alexandre B. Tsybakov
+ UPS delivers optimal phase diagram in high-dimensional variable selection 2012 Pengsheng Ji
Jiashun Jin
+ Probabilistic Best Subset Selection by Gradient-Based Optimization 2020 Mingzhang Yin
Nhat Ho
Bowei Yan
Xiaoning Qian
Mingyuan Zhou
+ Thresholded Lasso for high dimensional variable selection 2023 Shuheng Zhou
+ PDF Debiasing the lasso: Optimal sample size for Gaussian designs 2018 Adel Javanmard
Andrea Montanari
+ PDF Chat Regularity Properties for Sparse Regression 2016 Edgar Dobriban
Jianqing Fan
+ Covariance matrix estimation and variable selection in high dimension 2013 Mu Cai
+ Thresholded Lasso for high dimensional variable selection and statistical estimation 2010 Shuheng Zhou
+ The constrained Dantzig selector with enhanced consistency 2016 Yinfei Kong
Zemin Zheng
Jinchi Lv
+ PDF Improved matrix uncertainty selector 2013 M. Rosenbaum
Alexandre B. Tsybakov
+ Greedy Variance Estimation for the LASSO 2018 Christopher J. Kennedy
Rachel Ward
+ Greedy Variance Estimation for the LASSO 2018 Christopher Kennedy
Rachel Ward