Iterative hard thresholding for model selection in genome‐wide association studies

Type: Article

Publication Date: 2017-09-06

Citations: 6

DOI: https://doi.org/10.1002/gepi.22068

Abstract

A genome-wide association study (GWAS) correlates marker and trait variation in a study sample. Each subject is genotyped at a multitude of SNPs (single nucleotide polymorphisms) spanning the genome. Here, we assume that subjects are randomly collected unrelateds and that trait values are normally distributed or can be transformed to normality. Over the past decade, geneticists have been remarkably successful in applying GWAS analysis to hundreds of traits. The massive amount of data produced in these studies present unique computational challenges. Penalized regression with the ℓ1 penalty (LASSO) or minimax concave penalty (MCP) penalties is capable of selecting a handful of associated SNPs from millions of potential SNPs. Unfortunately, model selection can be corrupted by false positives and false negatives, obscuring the genetic underpinning of a trait. Here, we compare LASSO and MCP penalized regression to iterative hard thresholding (IHT). On GWAS regression data, IHT is better at model selection and comparable in speed to both methods of penalized regression. This conclusion holds for both simulated and real GWAS data. IHT fosters parallelization and scales well in problems with large numbers of causal markers. Our parallel implementation of IHT accommodates SNP genotype compression and exploits multiple CPU cores and graphics processing units (GPUs). This allows statistical geneticists to leverage commodity desktop computers in GWAS analysis and to avoid supercomputing.Source code is freely available at https://github.com/klkeys/IHT.jl.

Locations

  • Genetic Epidemiology - View
  • PubMed Central - View
  • arXiv (Cornell University) - View - PDF
  • Europe PMC (PubMed Central) - View - PDF
  • PubMed - View
  • DataCite API - View

Similar Works

Action Title Year Authors
+ PDF Chat Iterative Hard Thresholding in GWAS: Generalized Linear Models, Prior Weights, and Double Sparsity 2019 Benjamin B. Chu
Kevin L. Keys
Chris German
Hua Zhou
Jin Zhou
Eric M. Sobel
Janet S. Sinsheimer
Kenneth Lange
+ PDF Chat Iterative hard thresholding in genome-wide association studies: Generalized linear models, prior weights, and double sparsity 2020 Benjamin B. Chu
Kevin L. Keys
Chris German
Hua Zhou
Jin Zhou
Eric M. Sobel
Janet S. Sinsheimer
Kenneth Lange
+ Efficient Penalized Generalized Linear Mixed Models for Variable Selection and Genetic Risk Prediction in High-Dimensional Data 2022 Julien St-Pierre
Karim Oualkacha
Sahir Bhatnagar
+ PDF Chat Efficient penalized generalized linear mixed models for variable selection and genetic risk prediction in high-dimensional data 2023 Julien St-Pierre
Karim Oualkacha
Sahir Bhatnagar
+ PDF Chat A Fast and Flexible Algorithm for Solving the Lasso in Large-scale and Ultrahigh-dimensional Problems 2019 Junyang Qian
Wenfei Du
Yosuke Tanigawa
Matthew Aguirre
Robert Tibshirani
Manuel A. Rivas
Trevor Hastie
+ A fast and efficient smoothing approach to Lasso regression and an application in statistical genetics: polygenic risk scores for chronic obstructive pulmonary disease (COPD) 2021 Georg Hahn
Sharon M. Lutz
Nilanjana Laha
Michael H. Cho
Edwin K. Silverman
Christoph Lange
+ Variable Selection with Second-Generation P-Values 2021 Yi Zuo
Thomas G. Stewart
Jeffrey D. Blume
+ PDF Chat A smoothed version of the Lassosum penalty for fitting integrated risk models 2021 Georg Hahn
Dmitry Prokopenko
Sharon M. Lutz
Kristina Mullin
Rudolph E. Tanzi
Christoph Lange
+ PDF Chat A fast and efficient smoothing approach to Lasso regression and an application in statistical genetics: polygenic risk scores for chronic obstructive pulmonary disease (COPD) 2020 Georg Hahn
Sharon M. Lutz
Nilanjana Laha
Michael H. Cho
Edwin K. Silverman
Christoph Lange
+ Variable Selection with Second-Generation P-Values 2020 Yi Zuo
Thomas G. Stewart
Jeffrey D. Blume
+ Model Based Screening Embedded Bayesian Variable Selection for Ultra-high Dimensional Settings 2022 Dongjin Li
Somak Dutta
Vivekananda Roy
+ Bayesian variable selection regression for genome-wide association studies and other large-scale problems 2011 Yongtao Guan
Matthew Stephens
+ Thresholding-based Iterative Selection Procedures for Generalized Linear Models 2009 Yiyuan She
+ PDF Chat A Smoothed Version of the Lassosum Penalty for Fitting Integrated Risk Models Using Summary Statistics or Individual-Level Data 2022 Georg Hahn
Dmitry Prokopenko
Sharon M. Lutz
Kristina Mullin
Rudolph E. Tanzi
Michael H. Cho
Edwin K. Silverman
Christoph Lange
+ PDF Chat Variable Selection With Second-Generation P-Values 2021 Yi Zuo
Thomas G. Stewart
Jeffrey D. Blume
+ PDF Chat Model Based Screening Embedded Bayesian Variable Selection for Ultra-high Dimensional Settings 2022 Dongjin Li
Somak Dutta
Vivekananda Roy
+ Evaluating methods for Lasso selective inference in biomedical research by a comparative simulation study 2020 Michael N. Kammer
Daniela Dunkler
Stefan Michiels
Georg Heinze
+ High-Performance Statistical Computing in the Computing Environments of the 2020s 2020 Seyoon Ko
Hua Zhou
Jin Zhou
Joong‐Ho Won
+ Feature-specific inference for penalized regression using local false discovery rates 2018 Ryan S. Miller
Patrick Breheny
+ PDF Chat Penalized regression and model selection methods for polygenic scores on summary statistics 2020 Jack Pattee
Wei Pan