Extracting actionable information from genome scans

Genet Epidemiol. 2013 Jan;37(1):48-59. doi: 10.1002/gepi.21682. Epub 2012 Sep 19.

Abstract

Genome-wide association studies discovered numerous genetic variants significantly associated with various phenotypes. However, significant signals explain only a small portion of the variation in many traits. One explanation is that missing variation is found in "suggestive signals," i.e., variants with reasonably small P-values. However, it is not clear how to capture this information and use it optimally to design and analyze future studies. We propose to extract the available information from a genome scan by accurately estimating the means of univariate statistics. The means are estimated by: (i) computing the sum of squares (SS) of a genome scan's univariate statistics, (ii) using SS to estimate the expected SS for the means (SSM) of univariate statistics, and (iii) constructing accurate soft threshold (ST) estimators for means of univariate statistics by requiring that the SS of these estimators equals the SSM. When compared to competitors, ST estimators explain a substantially higher fraction of the variability in true means. The accuracy of proposed estimators can be used to design two-tier follow-up studies in which regions close to variants having ST-estimated means above a certain threshold are sequenced at high coverage and the rest of the genome is sequenced at low coverage. This follow-up approach reduces the sequencing burden by at least an order of magnitude when compared to a high coverage sequencing of the whole genome. Finally, we suggest ways in which ST methodology can be used to improve signal detection in future sequencing studies and to perform general statistical model selection.

MeSH terms

  • Data Interpretation, Statistical*
  • Genetic Variation
  • Genome, Human*
  • Genome-Wide Association Study*
  • Humans
  • Models, Genetic*
  • Polymorphism, Single Nucleotide*
  • Schizophrenia / genetics
  • White People / genetics