High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking

Type: Article

Publication Date: 2019-12-19

Citations: 28

DOI: https://doi.org/10.1007/s11222-019-09914-9

Abstract

Penalized likelihood approaches are widely used for high-dimensional regression. Although many methods have been proposed and the associated theory is now well-developed, the relative efficacy of different approaches in finite-sample settings, as encountered in practice, remains incompletely understood. There is therefore a need for empirical investigations in this area that can offer practical insight and guidance to users. In this paper we present a large-scale comparison of penalized regression methods. We distinguish between three related goals: prediction, variable selection and variable ranking. Our results span more than 2,300 data-generating scenarios, including both synthetic and semi-synthetic data (real covariates and simulated responses), allowing us to systematically consider the influence of various factors (sample size, dimensionality, sparsity, signal strength and multicollinearity). We consider several widely-used approaches (Lasso, Adaptive Lasso, Elastic Net, Ridge Regression, SCAD, the Dantzig Selector and Stability Selection). We find considerable variation in performance between methods. Our results support a `no panacea' view, with no unambiguous winner across all scenarios or goals, even in this restricted setting where all data align well with the assumptions underlying the methods. The study allows us to make some recommendations as to which approaches may be most (or least) suitable given the goal and some data characteristics. Our empirical results complement existing theory and provide a resource to compare methods across a range of scenarios and metrics.

Locations

  • PubMed Central - View
  • arXiv (Cornell University) - View - PDF
  • DataCite API - View
  • Statistics and Computing - View - PDF

Similar Works

Action Title Year Authors
+ Two Tales of Variable Selection for High Dimensional Data: Screening and Model Building 2012 Cong Liu
+ Automatically Identifying Relevant Variables for Linear Regression with the Lasso Method: A Methodological Primer for its Application with R and a Performance Contrast Simulation with Alternative Selection Strategies 2019 Sebastian Scherr
Jing Zhou
+ High-dimensional regression with potential prior information on variable importance 2021 Benjamin G. Stokell
Rajen D. Shah
+ PDF Chat Two tales of variable selection for high dimensional regression: Screening and model building 2014 Cong Liu
Tao Shi
Yoonkyung Lee
+ Using Generalized Correlation to Effect Variable Selection in Very High Dimensional Problems 2009 Peter Hall
H. R. Miller
+ Inference in High Dimensions with the Penalized Score Test 2014 Arend Voorman
Ali Shojaie
Daniela Witten
+ PDF Chat High-Dimensional LASSO-Based Computational Regression Models: Regularization, Shrinkage, and Selection 2019 Frank Emmert‐Streib
Matthias Dehmer
+ PDF Chat High-dimensional regression with potential prior information on variable importance 2022 Benjamin G. Stokell
Rajen D. Shah
+ PDF Chat Data-Driven Random Projection and Screening for High-Dimensional Generalized Linear Models 2024 Roman Parzer
Peter Filzmoser
Laura Vana-GĂźr
+ Post-Lasso Inference for High-Dimensional Regression 2018 X. Jessie Jeng
Huimin Peng
Wenbin Lu
+ Identifying a minimal class of models for high-dimensional data 2015 Daniel Nevo
Ya’acov Ritov
+ Identifying a minimal class of models for high-dimensional data 2015 Daniel Nevo
Ya’acov Ritov
+ PDF Chat Acomparative Study of Some Variables Selection Methods in High Dimensional Multiple Liner Regression Via Simulation 2022 Media Shamsaddin Bari
Hussein Abdulrahman Hashem
+ Solar: a least-angle regression for accurate and stable variable selection in high-dimensional data 2020 Xu Ning
Timothy S. Fisher
Jian Hong
+ A Survey of Tuning Parameter Selection for High-dimensional Regression 2019 Yunan Wu
Lan Wang
+ Statistics for High-Dimensional Data: Methods, Theory and Applications 2011 Peter Bhlmann
Sara van de Geer
+ PDF Chat Comparison of variable selection procedures and investigation of the role of shrinkage in linear regression-protocol of a simulation study in low-dimensional data 2022 Edwin Kipruto
Willi Sauerbrei
+ Development of Two Methods for Estimating High-Dimensional Data in the Case of Multicollinearity and Outliers 2024 Ahmed A. El-Sheikh
Md. Sadek Ali
Mohamed R. Abonazel
+ PDF Chat A Survey of Tuning Parameter Selection for High-Dimensional Regression 2020 Yunan Wu
Lan Wang
+ Minimal class of models for high-dimensional data 2015 Daniel Nevo
Ya’acov Ritov