Combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment

Type: Article

Publication Date: 2016-01-12

Citations: 43

DOI: https://doi.org/10.1186/s12859-015-0870-z

Abstract

In the context of high-throughput molecular data analysis it is common that the observations included in a dataset form distinct groups; for example, measured at different times, under different conditions or even in different labs. These groups are generally denoted as batches. Systematic differences between these batches not attributable to the biological signal of interest are denoted as batch effects. If ignored when conducting analyses on the combined data, batch effects can lead to distortions in the results. In this paper we present FAbatch, a general, model-based method for correcting for such batch effects in the case of an analysis involving a binary target variable. It is a combination of two commonly used approaches: location-and-scale adjustment and data cleaning by adjustment for distortions due to latent factors. We compare FAbatch extensively to the most commonly applied competitors on the basis of several performance metrics. FAbatch can also be used in the context of prediction modelling to eliminate batch effects from new test data. This important application is illustrated using real and simulated data. We implemented FAbatch and various other functionalities in the R package bapred available online from CRAN.FAbatch is seen to be competitive in many cases and above average in others. In our analyses, the only cases where it failed to adequately preserve the biological signal were when there were extremely outlying batches and when the batch effects were very weak compared to the biological signal.As seen in this paper batch effect structures found in real datasets are diverse. Current batch effect adjustment methods are often either too simplistic or make restrictive assumptions, which can be violated in real datasets. Due to the generality of its underlying model and its ability to perform well FAbatch represents a reliable tool for batch effect adjustment for most situations found in practice.

Locations

  • BMC Bioinformatics - View - PDF
  • PubMed Central - View
  • Europe PMC (PubMed Central) - View - PDF
  • HAL (Le Centre pour la Communication Scientifique Directe) - View
  • PubMed - View
  • Open Acess LMU (Ludwig-Maximilians-UniversitĂ€t MĂŒnchen) - View - PDF

Similar Works

Action Title Year Authors
+ PDF Chat Batch Effects Correction with Unknown Subtypes 2018 Xiangyu Luo
Yingying Wei
+ PDF Chat PLSDA-batch: a multivariate framework to correct for batch effects in microbiome data 2023 Yiwen Wang
Kim‐Anh LĂȘ Cao
+ PDF Chat Empirical Bayes shrinkage and false discovery rate estimation, allowing for unwanted variation 2018 David Gerard
Matthew Stephens
+ PDF Chat A multivariate method to correct for batch effects in microbiome data 2020 Yiwen Wang
Kim‐Anh LĂȘ Cao
+ PDF Chat The importance of batch sensitization in missing value imputation 2023 Harvard Wai Hann Hui
Weijia Kong
Hui Peng
Wilson Wen Bin Goh
+ PDF Chat Why Batch Sensitization is Important for Missing Value Imputation 2022 P Sun
Wilson Wen Bin Goh
+ PDF Chat Heterogeneous Large Datasets Integration Using Bayesian Factor Regression 2020 Alejandra Avalos-Pacheco
David Rossell
Richard S. Savage
+ Heterogeneous large datasets integration using Bayesian factor regression 2018 Alejandra Avalos-Pacheco
David Rossell
Richard S. Savage
+ Heterogeneous large datasets integration using Bayesian factor regression 2018 Alejandra Avalos-Pacheco
David Rossell
Richard S. Savage
+ PDF Chat Comparison of statistical methods and the use of quality control samples for batch effect correction in human transcriptome data 2018 Almudena EspĂ­n-PĂ©rez
Christopher J. Portier
Marc Chadeau‐Hyam
Karin van Veldhoven
Jos Kleinjans
Theo M. de Kok
+ PDF Chat Mitigating the adverse impact of batch effects in sample pattern detection 2018 Teng Fei
Tengjiao Zhang
Weiyang Shi
Tianwei Yu
+ Addressing the challenges of uncertainty in regression models for high dimensional and heterogeneous data from observational studies 2020 Simon Klau
+ PDF Chat Removal of batch effects using distribution-matching residual networks 2017 Uri Shaham
Kelly Stanton
Jun Zhao
Huamin Li
Khadir Raddassi
Ruth R. Montgomery
Yuval Kluger
+ Factor regression for dimensionality reduction and data integration techniques with applications to cancer data 2018 Alejandra Avalos Pacheco
+ PDF Chat OSAT: a tool for sample-to-batch allocations in genomics experiments 2012 Li Yan
Chang‐Xing Ma
Dan Wang
Qiang Hu
Maochun Qin
Jeffrey M. Conroy
Lara E. Sucheston
Christine B. Ambrosone
Candace S. Johnson
Jianmin Wang
+ PDF Chat Batch effects removal for microbiome data via conditional quantile regression 2022 Wodan Ling
Jiuyao Lu
Ni Zhao
Anju Lulla
Anna Plantinga
Weijia Fu
Angela Zhang
Hongjiao Liu
Hoseung Song
Zhigang Li
+ PDF Chat Methods that remove batch effects while retaining group differences may lead to exaggerated confidence in downstream analyses 2015 Vegard Nygaard
Einar Andreas RĂždland
Eivind Hovig
+ PDF Chat Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed 2015 Laurent Jacob
Johann A. Gagnon-Bartsch
Terence P. Speed
+ BatMan: Mitigating Batch Effects via Stratification for Survival Outcome Prediction 2022 Ai Ni
Mengling Liu
Li‐Xuan Qin
+ PDF Chat BatMan: Mitigating Batch Effects Via Stratification for Survival Outcome Prediction 2023 Ai Ni
Mengling Liu
Li‐Xuan Qin