Subset Selection with Shrinkage: Sparse Linear Modeling When the SNR Is Low

Type: Article

Publication Date: 2022-05-24

Citations: 43

DOI: https://doi.org/10.1287/opre.2022.2276

Abstract

Learning Compact High-Dimensional Models in Noisy Environments Building compact, interpretable statistical models where the output depends upon a small number of input features is a well-known problem in modern analytics applications. A fundamental tool used in this context is the prominent best subset selection (BSS) procedure, which seeks to obtain the best linear fit to data subject to a constraint on the number of nonzero features. Whereas the BSS procedure works exceptionally well in some regimes, it performs pretty poorly in out-of-sample predictive performance when the underlying data are noisy, which is quite common in practice. In this paper, we explore this relatively less-understood overfitting behavior of BSS in low-signal noisy environments and propose alternatives that appear to mitigate such shortcomings. We study the theoretical statistical properties of our proposed regularized BSS procedure and show promising computational results on various data sets, using tools from integer programming and first-order methods.

Locations

  • Operations Research - View
  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ Subset Selection with Shrinkage: Sparse Linear Modeling when the SNR is low 2017 Rahul Mazumder
Peter Radchenko
Antoine Dedieu
+ Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms 2018 Hussein Hazimeh
Rahul Mazumder
+ Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms 2018 Hussein Hazimeh
Rahul Mazumder
+ Robust subset selection 2020 Ryan Thompson
+ PDF Chat Sparse Regression: Scalable Algorithms and Empirical Performance 2020 Dimitris Bertsimas
Jean Pauphilet
Bart Van Parys
+ The Discrete Dantzig Selector: Estimating Sparse Linear Models via Mixed Integer Linear Optimization 2015 Rahul Mazumder
Peter Radchenko
+ The Discrete Dantzig Selector: Estimating Sparse Linear Models via Mixed Integer Linear Optimization 2015 Rahul Mazumder
Peter Radchenko
+ PDF Chat Robust subset selection 2022 Ryan Thompson
+ Best subset selection via a modern optimization lens 2016 Dimitris Bertsimas
Angela King
Rahul Mazumder
+ Best Subset Selection via a Modern Optimization Lens 2015 Dimitris Bertsimas
Angela G. King
Rahul Mazumder
+ Best Subset Selection via a Modern Optimization Lens 2015 Dimitris Bertsimas
Angela King
Rahul Mazumder
+ Understanding Best Subset Selection: A Tale of Two C(omplex)ities 2023 Saptarshi Roy
Ambuj Tewari
Ziwei Zhu
+ PDF Chat A Model Selection Criterion for High-Dimensional Linear Regression 2018 Arash Owrang
Magnus Jansson
+ Best-Subset Selection in Generalized Linear Models: A Fast and Consistent Algorithm via Splicing Technique 2023 Junxian Zhu
Jin Zhu
Borui Tang
Xuanyu Chen
Hongmei Lin
Xueqin Wang
+ PDF Chat Variable selection in linear regression models: Choosing the best subset is not always the best choice 2023 Moritz Hanke
Louis Dijkstra
Ronja Foraita
Vanessa Didelez
+ The Discrete Dantzig Selector: Estimating Sparse Linear Models via Mixed Integer Linear Optimization 2017 Rahul Mazumder
Peter Radchenko
+ Variable selection in linear regression models: choosing the best subset is not always the best choice 2023 Moritz Hanke
Louis Dijkstra
Ronja Foraita
Vanessa Didelez
+ Probabilistic Best Subset Selection via Gradient-Based Optimization 2020 Mingzhang Yin
Nhat Ho
Bowei Yan
Xiaoning Qian
Mingyuan Zhou
+ Grouped Variable Selection with Discrete Optimization: Computational and Statistical Perspectives 2021 Hussein Hazimeh
Rahul Mazumder
Peter Radchenko
+ When is best subset selection the "best"? 2020 Jianqing Fan
Yongyi Guo
Ziwei Zhu

Works That Cite This (31)

Action Title Year Authors
+ PDF Chat An extended Newton-type algorithm for <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="inline" id="d1e2442" altimg="si3.svg"><mml:msub><mml:mrow><mml:mi>ℓ</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math>-regularized sparse logistic regression and its efficiency for classifying large-scale datasets 2021 Rui Wang
Naihua Xiu
Shenglong Zhou
+ PDF Chat Robust subset selection 2022 Ryan Thompson
+ Sparse Unit-Sum Regression 2019 Nick Koning
Paul A. Bekker
+ PDF Chat Estimation of a treatment effect based on a modified covariates method with<i>L</i><sub>0</sub>norm 2023 Kensuke Tanioka
Kaoru Okuda
Satoru Hiwa
Tomoyuki Hiroyasu
+ Cutting-plane algorithm for estimation of sparse Cox proportional hazards models 2023 Hiroki Saishu
Kota Kudo
Yuichi Takano
+ PDF Chat Model Selection in Generalized Linear Models 2023 Abdulla Mamun
S. R. Paul
+ PDF Chat Grouped variable selection with discrete optimization: Computational and statistical perspectives 2023 Hussein Hazimeh
Rahul Mazumder
Peter Radchenko
+ Scalable Algorithms for the Sparse Ridge Regression 2018 Weijun Xie
Xinwei Deng
+ Feature and functional form selection in additive models via mixed-integer optimization 2024 Manuel Navarro-GarcĂ­a
Vanesa Guerrero
Maŕıa Durbán
Arturo del Cerro
+ PDF Chat Sparse HP filter: Finding kinks in the COVID-19 contact rate 2020 Sokbae Lee
Myung Hwan Seo
Youngki Shin
Yuan Liao