Estimation Stability With Cross-Validation (ESCV)

Chinghway Lim, Bin Yu

Type: Article

Publication Date: 2015-04-03

Citations: 91

DOI: https://doi.org/10.1080/10618600.2015.1020159

Abstract

Cross-validation (CV) is often used to select the regularization parameter in high-dimensional problems. However, when applied to the sparse modeling method Lasso, CV leads to models that are unstable in high-dimensions, and consequently not suited for reliable interpretation. In this article, we propose a model-free criterion ESCV based on a new estimation stability (ES) metric and CV. Our proposed ESCV finds a smaller and locally ES-optimal model smaller than the CV choice so that it fits the data and also enjoys estimation stability property. We demonstrate that ESCV is an effective alternative to CV at a similar easily parallelizable computational cost. In particular, we compare the two approaches with respect to several performance measures when applied to the Lasso on both simulated and real datasets. For dependent predictors common in practice, our main finding is that ESCV cuts down false positive rates often by a large margin, while sacrificing little of true positive rates. ESCV usually outperforms CV in terms of parameter estimation while giving similar performance as CV in terms of prediction. For the two real datasets from neuroscience and cell biology, the models found by ESCV are less than half of the model sizes by CV, but preserves CV's predictive performance and corroborates with subject knowledge and independent work. We also discuss some regularization parameter alignment issues that come up in both approaches. Supplementary materials are available online.

Locations

arXiv (Cornell University) - PDF
Journal of Computational and Graphical Statistics - View

Similar Works

Action	Title	Year	Authors
+	Estimation Stability with Cross Validation (ESCV)	2013	Chinghway Lim Bin Yu
+	The Instability of Cross-Validated Lasso	2013	Kine Veronica Lund
+	Cross-Validated Loss-Based Covariance Matrix Estimator Selection in High Dimensions	2021	Philippe Boileau Nima S. Hejazi Mark J. van der Laan Sandrine Dudoit
+ PDF Chat	nestedcv: an R package for fast implementation of nested cross-validation with embedded feature selection designed for transcriptomics and high-dimensional data	2023	Myles Lewis Athina Spiliopoulou Katriona Goldmann Costantino Pitzalis Paul McKeigue Michael R. Barnes
+	Consistent selection of tuning parameters via variable selection stability	2013	Wei Sun Junhui Wang Yixin Fang
+	Loss-guided Stability Selection	2022	Tino Werner
+	$\left( β, \varpi \right)$-stability for cross-validation and the choice of the number of folds	2017	Ning Xu Jian Hong Timothy S. Fisher
+ PDF Chat	On the Selection Stability of Stability Selection and Its Applications	2024	Mahdi Nouraie Samuel Müller
+	Rademacher upper bounds for cross-validation errors with an application to the lasso	2020	Ning Xu Timothy S. Fisher Jian Hong
+	Consistent selection of tuning parameters via variable selection stability	2012	Wei Sun Junhui Wang Yixin Fang
+ PDF Chat	Loss-guided stability selection	2023	Tino Werner
+ PDF Chat	Cross-Validation With Confidence	2019	Jing Lei
+	The restricted consistency property of leave-$n_v$-out cross-validation for high-dimensional variable selection	2013	Yang Feng Yi Yu
+	Cross-Validated Loss-based Covariance Matrix Estimator Selection in High Dimensions	2022	Philippe Boileau Nima S. Hejazi Mark J. van der Laan Sandrine Dudoit
+	Cross-Validated Loss-based Covariance Matrix Estimator Selection in High Dimensions	2022	Philippe Boileau Nima S. Hejazi Mark J. van der Laan Sandrine Dudoit
+	Gain Confidence, Reduce Disappointment: A New Approach to Cross-Validation for Sparse Regression	2023	Ryan Cory-Wright Andrés Gómez
+ PDF Chat	Predictive Performance Test based on the Exhaustive Nested Cross-Validation for High-dimensional data	2024	Iris Ivy Gauran Hernando Ombao Zhaoxia Yu
+	Cross-Validation with Confidence	2017	Jing Lei
+	Cross-validation and regression analysis in high-dimensionalsparse linear models	2011	Feng Zhang
+	Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models.	2010	Han Liu Kathryn Roeder Larry Wasserman

Works That Cite This (52)

Action	Title	Year	Authors
+	ET-Lasso: A New Efficient Tuning of Lasso-type Regularization for High-Dimensional Data	2018	Songshan Yang Jiawei Wen Xiang Zhan Daniel Kifer
+	ET-Lasso	2019	Songshan Yang Jiawei Wen Xiang Zhan Daniel Kifer
+ PDF Chat	Quality Dimensions of Machine Learning in Official Statistics	2023	Younes Saidani Florian Dumpert Christian Borgs Alexander Brand Andreas Nickl Alexandra Rittmann Johannes Rohde Christian Salwiczek Nina Storfinger S. Strauβ
+ PDF Chat	Loss-guided stability selection	2023	Tino Werner
+ PDF Chat	MoST: model specification test by variable selection stability	2024	Xiaonan Hu
+	Solar: a least-angle regression for stable variable selection in high-dimensional spaces	2020	Ning Xu Timothy S. Fisher Jian Hong
+	Rademacher upper bounds for cross-validation errors with an application to the lasso	2020	Ning Xu Timothy S. Fisher Jian Hong
+	Variable importance based interaction modelling with an application on initial spread of COVID-19 in China	2024	Jianqiang Zhang Ze Chen Yuhong Yang Wangli Xu
+	Stability enhanced variable selection for a semiparametric model with flexible missingness mechanism and its application to the ChAMP study	2019	Yang Yang Jiwei Zhao Gregory E. Wilding Melissa A. Kluczynski Leslie J. Bisson
+ PDF Chat	Integrating additional knowledge into the estimation of graphical models	2021	Yunqi Bu Johannes Lederer

Works Cited by This (45)

Action	Title	Year	Authors
+ PDF Chat	Stability of Dynamical Systems	2018	Nivaldo A. Lemos̀
+	Sharp thresholds for high-dimensional and noisy recovery of sparsity	2006	Martin J. Wainwright
+	$V$-fold cross-validation and $V$-fold penalization in least-squares density estimation	2012	Sylvain Arlot Matthieu Lerasle
+	Regression and time series model selection in small samples	1989	Clifford M. Hurvich Chih‐Ling Tsai
+ PDF Chat	Instability of least squares, least absolute deviation and least median of squares linear regression, with a comment by Stephen Portnoy and Ivan Mizera and a rejoinder by the author	1998	Steven P. Ellis
+ PDF Chat	Consistency of cross validation for comparing regression procedures	2007	Yuhong Yang
+	Combining Linear Regression Models	2005	Zheng Yuan Yuhong Yang
+	Linear Model Selection by Cross-validation	1993	Jun Shao
+	The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction	1974	David M. Allen
+ PDF Chat	Least angle regression	2004	Bradley Efron Trevor Hastie Iain M. Johnstone Robert Tibshirani