Propensity scores as a novel method to guide sample allocation and minimize batch effects during the design of high throughput experiments

Type: Article

Publication Date: 2023-03-07

Citations: 0

DOI: https://doi.org/10.1186/s12859-023-05202-6

View Chat PDF

Abstract

We developed a novel approach to minimize batch effects when assigning samples to batches. Our algorithm selects a batch allocation, among all possible ways of assigning samples to batches, that minimizes differences in average propensity score between batches. This strategy was compared to randomization and stratified randomization in a case-control study (30 per group) with a covariate (case vs control, represented as β1, set to be null) and two biologically relevant confounding variables (age, represented as β2, and hemoglobin A1c (HbA1c), represented as β3). Gene expression values were obtained from a publicly available dataset of expression data obtained from pancreas islet cells. Batch effects were simulated as twice the median biological variation across the gene expression dataset and were added to the publicly available dataset to simulate a batch effect condition. Bias was calculated as the absolute difference between observed betas under the batch allocation strategies and the true beta (no batch effects). Bias was also evaluated after adjustment for batch effects using ComBat as well as a linear regression model. In order to understand performance of our optimal allocation strategy under the alternative hypothesis, we also evaluated bias at a single gene associated with both age and HbA1c levels in the 'true' dataset (CAPN13 gene).Pre-batch correction, under the null hypothesis (β1), maximum absolute bias and root mean square (RMS) of maximum absolute bias, were minimized using the optimal allocation strategy. Under the alternative hypothesis (β2 and β3 for the CAPN13 gene), maximum absolute bias and RMS of maximum absolute bias were also consistently lower using the optimal allocation strategy. ComBat and the regression batch adjustment methods performed well as the bias estimates moved towards the true values in all conditions under both the null and alternative hypotheses. Although the differences between methods were less pronounced following batch correction, estimates of bias (average and RMS) were consistently lower using the optimal allocation strategy under both the null and alternative hypotheses.Our algorithm provides an extremely flexible and effective method for assigning samples to batches by exploiting knowledge of covariates prior to sample allocation.

Locations

Similar Works

Action Title Year Authors
+ PDF Chat OSAT: a tool for sample-to-batch allocations in genomics experiments 2012 Li Yan
Chang‐Xing Ma
Dan Wang
Qiang Hu
Maochun Qin
Jeffrey M. Conroy
Lara E. Sucheston
Christine B. Ambrosone
Candace S. Johnson
Jianmin Wang
+ PDF Chat Comparison of statistical methods and the use of quality control samples for batch effect correction in human transcriptome data 2018 Almudena Espín-Pérez
Christopher J. Portier
Marc Chadeau‐Hyam
Karin van Veldhoven
Jos Kleinjans
Theo M. de Kok
+ PDF Chat Methods that remove batch effects while retaining group differences may lead to exaggerated confidence in downstream analyses 2015 Vegard Nygaard
Einar Andreas Rødland
Eivind Hovig
+ PDF Chat Importance of Block Randomization When Designing Proteomics Experiments 2020 Bram Burger
Marc Vaudel
Harald Barsnes
+ Abstract 1659: Overcorrection of batch effects by ComBat can be avoided by using an equal medians method 2019 John C. Obenauer
Thomas P. Stockfisch
Marcia V. Fournier
+ PDF Chat Empirical Bayes shrinkage and false discovery rate estimation, allowing for unwanted variation 2018 David Gerard
Matthew Stephens
+ PDF Chat Well Plate Maker: a user-friendly randomized block design application to limit batch effects in large-scale biomedical studies 2021 Hélène Borges
Anne-Marie Hesse
Alexandra Kraut
Yohann Couté
Virginie Brun
Thomas Bürger
+ TWO‐SIGMA: A novel two‐component single cell model‐based association method for single‐cell RNA‐seq data 2020 Eric Van Buren
Ming Hu
Chen Weng
Fulai Jin
Yan Li
Di Wu
Yun Li
+ PDF Chat Automated splitting into batches for observational biomedical studies with sequential processing 2022 Bram Burger
Marc Vaudel
Harald Barsnes
+ BatMan: Mitigating Batch Effects via Stratification for Survival Outcome Prediction 2022 Ai Ni
Mengling Liu
Li‐Xuan Qin
+ PDF Chat BatMan: Mitigating Batch Effects Via Stratification for Survival Outcome Prediction 2023 Ai Ni
Mengling Liu
Li‐Xuan Qin
+ PDF Chat TWO-SIGMA: a novel TWO-component SInGle cell Model-based Association method for single-cell RNA-seq data 2019 Eric Van Buren
Ming Hu
Chen Weng
Fulai Jin
Yan Li
Di Wu
Yun Li
+ Extent, impact, and mitigation of batch effects in tumor biomarker studies using tissue microarrays 2021 Konrad H. Stopsack
Svitlana Tyekucheva
Molin Wang
Travis Gerke
Jane B. Vaselkiv
Kathryn L. Penney
Philip W. Kantoff
Stephen P. Finn
Michelangelo Fiorentino
Massimo Loda
+ PDF Chat High-dimensional Statistics Applications to Batch Effects in Metabolomics 2024 Zhendong Guo
+ PDF Chat Extent, impact, and mitigation of batch effects in tumor biomarker studies using tissue microarrays 2021 Konrad H. Stopsack
Svitlana Tyekucheva
Molin Wang
Travis Gerke
Jane B. Vaselkiv
Kathryn L. Penney
Philip W. Kantoff
Stephen P. Finn
Michelangelo Fiorentino
Massimo Loda
+ PDF Chat Batch Effect Confounding Leads to Strong Bias in Performance Estimates Obtained by Cross-Validation 2014 Charlotte Soneson
Sarah Gerster
Mauro Delorenzi
+ Removing batch effects for prediction problems with frozen surrogate variable analysis 2014 Hilary S. Parker
Héctor Corrada Bravo
Jeffrey T. Leek
+ PDF Chat The importance of batch sensitization in missing value imputation 2023 Harvard Wai Hann Hui
Weijia Kong
Hui Peng
Wilson Wen Bin Goh
+ On the importance of block randomisation when designing proteomics experiments 2020 Bram Burger
Marc Vaudel
Harald Barsnes
+ On the importance of block randomisation when designing proteomics experiments 2020 Bram Burger
Marc Vaudel
Harald Barsnes

Cited by (0)

Action Title Year Authors

Citing (12)

Action Title Year Authors
+ A critical appraisal of propensity‐score matching in the medical literature between 1996 and 2003 2007 Peter C. Austin
+ Marginal Structural Models and Causal Inference in Epidemiology 2000 James M. Robins
Miguel A. Hernán
Babette Brumback
+ PDF Chat Stratified Randomization for Clinical Trials 1999 Walter N. Kernan
Catherine M. Viscoli
Robert Makuch
Lawrence Brass
Ralph I. Horwitz
+ PDF Chat OSAT: a tool for sample-to-batch allocations in genomics experiments 2012 Li Yan
Chang‐Xing Ma
Dan Wang
Qiang Hu
Maochun Qin
Jeffrey M. Conroy
Lara E. Sucheston
Christine B. Ambrosone
Candace S. Johnson
Jianmin Wang
+ PDF Chat Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies 2015 Peter C. Austin
Elizabeth A. Stuart
+ PDF Chat Methods that remove batch effects while retaining group differences may lead to exaggerated confidence in downstream analyses 2015 Vegard Nygaard
Einar Andreas Rødland
Eivind Hovig
+ Covariate-based constrained randomization of group-randomized trials 2004 Lawrence H. Moulton
+ PDF Chat Allocation techniques for balance at baseline in cluster randomized trials: a methodological review 2012 Noah Ivers
Ilana Halperin
Jan Barnsley
Jeremy Grimshaw
Baiju R. Shah
Karen Tu
Ross Upshur
Merrick Zwarenstein
+ PDF Chat The central role of the propensity score in observational studies for causal effects 1983 Paul R. Rosenbaum
Donald B. Rubin
+ PDF Chat Matching Methods for Causal Inference: A Review and a Look Forward 2010 Elizabeth A. Stuart
+ Why Batch Effects Matter in Omics Data, and How to Avoid Them 2017 Wilson Wen Bin Goh
Wei Wang
Limsoon Wong
+ experiment: R Package for Designing and Analyzing Randomized Experiments 2007 Kosuke Imai
Zhichao Jiang