Combining sequence data from multiple studies: Impact of analysis strategies on rare variant calling and association results

Genet Epidemiol. 2020 Jan;44(1):41-51. doi: 10.1002/gepi.22261. Epub 2019 Sep 14.

Abstract

Individual sequencing studies often have limited sample sizes and so limited power to detect trait associations with rare variants. A common strategy is to aggregate data from multiple studies. For studying rare variants, jointly calling all samples together is the gold standard strategy but can be difficult to implement due to privacy restrictions and computational burden. Here, we compare joint calling to the alternative of single-study calling in terms of variant detection sensitivity and genotype accuracy as a function of sequencing coverage and assess their impact on downstream association analysis. To do so, we analyze deep-coverage (~82×) exome and low-coverage (~5×) genome sequence data on 2,250 individuals from the Genetics of Type 2 Diabetes study jointly and separately within five geographic cohorts. For rare single nucleotide variants (SNVs): (a) ≥97% of discovered SNVs are found by both calling strategies; (b) nonreference concordance with a set of highly accurate genotypes is ≥99% for both calling strategies; (c) meta-analysis has similar power to joint analysis in deep-coverage sequence data but can be less powerful in low-coverage sequence data. Given similar data processing and quality control steps, we recommend single-study calling as a viable alternative to joint calling for analyzing SNVs of all minor allele frequency in deep-coverage data.

Keywords: Sequencing studies; joint analysis; meta-analysis; rare variants.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Diabetes Mellitus, Type 2 / genetics*
  • Exome / genetics
  • Gene Frequency / genetics*
  • Genotype
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Polymorphism, Single Nucleotide / genetics*