False Discovery Rate Control via Data Splitting

Type: Article

Publication Date: 2022-03-31

Citations: 36

DOI: https://doi.org/10.1080/01621459.2022.2060113

Abstract

Selecting relevant features associated with a given response variable is an important problem in many scientific fields. Quantifying quality and uncertainty of a selection result via false discovery rate (FDR) control has been of recent interest. This article introduces a data-splitting method (referred to as "DS") to asymptotically control the FDR while maintaining a high power. For each feature, DS constructs a test statistic by estimating two independent regression coefficients via data splitting. FDR control is achieved by taking advantage of the statistic's property that, for any null feature, its sampling distribution is symmetric about zero; whereas for a relevant feature, its sampling distribution has a positive mean. Furthermore, a Multiple Data Splitting (MDS) method is proposed to stabilize the selection result and boost the power. Surprisingly, with the FDR under control, MDS not only helps overcome the power loss caused by data splitting, but also results in a lower variance of the false discovery proportion (FDP) compared with all other methods in consideration. Extensive simulation studies and a real-data application show that the proposed methods are robust to the unknown distribution of features, easy to implement and computationally efficient, and are often the most powerful ones among competitors especially when the signals are weak and correlations or partial correlations among features are high. Supplementary materials for this article are available online.

Locations

  • arXiv (Cornell University) - View - PDF
  • OPAL (Open@LaTrobe) (La Trobe University) - View - PDF
  • Journal of the American Statistical Association - View

Similar Works

Action Title Year Authors
+ False Discovery Rate Control via Data Splitting 2020 Chenguang Dai
Buyu Lin
Xin Xing
Jun S. Liu
+ Model-free controlled variable selection via data splitting 2022 Yixin Han
Xu Guo
Changliang Zou
+ A Feature Selection Method that Controls the False Discovery Rate 2022 Mehdi Rostami
Olli Saarela
+ PDF Chat False Discovery Rate Control via Data Splitting for Testing-after-Clustering 2024 Lijun Wang
Yingxin Lin
Hongyu Zhao
+ PDF Chat Threshold Selection in Feature Screening for Error Rate Control 2021 Xu Guo
Haojie Ren
Changliang Zou
Runze Li
+ PDF Chat False Discovery Control in Multiple Testing: A Brief Overview of Theories and Methodologies 2024 Jianliang He
Bowen Gang
Luella Fu
+ False discovery rate control for high-dimensional Cox model with uneven data splitting 2023 Yeheng Ge
Sijia Zhang
Xiao Zhang
+ Local False Discovery Rate Estimation with Competition-Based Procedures for Variable Selection 2022 Xiaoya Sun
Yan Fu
+ Data-driven selection of the number of change-points via error rate control 2021 Hui Chen
Haojie Ren
Fang Yao
Changliang Zou
+ A Scale-free Approach for False Discovery Rate Control in Generalized Linear Models 2020 Chenguang Dai
Buyu Lin
Xin Xing
Jun S. Liu
+ PDF Chat A flexible approach: variable selection procedures with multilayer FDR control via e-values 2024 Chengyao Yu
Ruixing Ming
Xiao Min
Zhanfeng Wang
+ Efficient Stratified Testing Procedure for a False Discovery Rate 2014 Seungbong Han
Adin‐Cristian Andrei
Kam Wah Tsui
+ PDF Chat SyNPar: Synthetic Null Data Parallelism for High-Power False Discovery Rate Control in High-Dimensional Variable Selection 2025 Changhu Wang
Ziheng Zhang
Jingyi Jessica Li
+ PDF Chat A Method to Increase the Power of Multiple Testing Procedures Through Sample Splitting 2006 Daniel B. Rubin
Sandrine Dudoit
Mark van der Laan
+ Evaluations of FWER-controlling methods in multiple hypothesis testing 2010 Yi‐Ting Hwang
Jia-Jung Lai
Shyh-Tyan Ou
+ Evaluating FDR and stratified FDR control approaches for high-throughput biological studies 2012 Jinfeng Zou
Guini Hong
Junjie Zheng
Chunxiang Hao
Jing Wang
Zheng Guo
+ PDF Chat Controlling the False Discovery Rate via symmetrized data aggregation based on SLOPE 2020 Yihe Guo
Xuemin Zi
+ Review of Fundamental Statistical Concepts 2010 Francisco Azuaje
+ PDF Chat False Discovery Rate Control Under General Dependence By Symmetrized Data Aggregation 2021 Lilun Du
Xu Guo
Wenguang Sun
Changliang Zou
+ Multiple Testing for Composite Null with FDR Control Guarantee 2021 Ran Dai
Cheng Zheng