SNP genotype calling and quality control for multi-batch-based studies

Genes Genomics. 2019 Aug;41(8):927-939. doi: 10.1007/s13258-019-00827-5. Epub 2019 May 6.

Abstract

Background: In genetic analyses, the term 'batch effect' refers to systematic differences caused by batch heterogeneity. Controlling this unintended effect is the most important step in quality control (QC) processes that precede analyses. Currently, batch effects are not appropriately controlled by statistics, and newer approaches are required.

Methods: In this report, we propose a new method to detect the heterogeneity of probe intensities among different batches and a procedure for calling genotypes and QC in the presence of a batch effect. First, we conducted a multivariate analysis of variance (MANOVA) to test the differences in probe intensities among batches. If heterogeneity is detected, subjects should be clustered using a K-medoid algorithm using the averages of the probe intensity measurements for each batch and the genotypes of subjects in different clusters should be called separately.

Results: The proposed method was used to assess genotyping data of 3619 subjects consisting of 1074 patients with Alzheimer's disease, 296 with mild cognitive impairment (MCI), and 1153 controls. The proposed method improves the accuracy of called genotypes without the need to filter a lot of subjects and SNPs, and therefore is a reasonable approach for controlling batch effects.

Conclusions: We proposed a new strategy that detects batch effects with probe intensity measurement and calls genotypes in the presence of batch effects. The application of the proposed method to real data shows that it produces a balanced approach. Furthermore, the proposed method can be extended to various scenarios with a simple modification.

Keywords: Batch effect; Calling; Genome-wide association analysis; K-medoid clustering; Quality control.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alzheimer Disease / genetics
  • Analysis of Variance
  • Cognitive Dysfunction / genetics
  • Genetic Heterogeneity
  • Genome-Wide Association Study / methods*
  • Genome-Wide Association Study / standards
  • Genotyping Techniques / methods*
  • Genotyping Techniques / standards
  • Humans
  • Polymorphism, Single Nucleotide*