Priors, population sizes, and power in genome-wide hypothesis tests

Jitong Cai; Jianan Zhan; Dan E Arking; Joel S Bader

doi:10.1186/s12859-023-05261-9

Priors, population sizes, and power in genome-wide hypothesis tests

BMC Bioinformatics. 2023 Apr 26;24(1):170. doi: 10.1186/s12859-023-05261-9.

Authors

Jitong Cai¹, Jianan Zhan¹, Dan E Arking², Joel S Bader^{3

4}

Affiliations

¹ Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA.
² Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, 21218, USA.
³ Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA. joel.bader@jhu.edu.
⁴ Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, 21218, USA. joel.bader@jhu.edu.

Abstract

Background: Genome-wide tests, including genome-wide association studies (GWAS) of germ-line genetic variants, driver tests of cancer somatic mutations, and transcriptome-wide association tests of RNAseq data, carry a high multiple testing burden. This burden can be overcome by enrolling larger cohorts or alleviated by using prior biological knowledge to favor some hypotheses over others. Here we compare these two methods in terms of their abilities to boost the power of hypothesis testing.

Results: We provide a quantitative estimate for progress in cohort sizes and present a theoretical analysis of the power of oracular hard priors: priors that select a subset of hypotheses for testing, with an oracular guarantee that all true positives are within the tested subset. This theory demonstrates that for GWAS, strong priors that limit testing to 100-1000 genes provide less power than typical annual 20-40% increases in cohort sizes. Furthermore, non-oracular priors that exclude even a small fraction of true positives from the tested set can perform worse than not using a prior at all.

Conclusion: Our results provide a theoretical explanation for the continued dominance of simple, unbiased univariate hypothesis tests for GWAS: if a statistical question can be answered by larger cohort sizes, it should be answered by larger cohort sizes rather than by more complicated biased methods involving priors. We suggest that priors are better suited for non-statistical aspects of biology, such as pathway structure and causality, that are not yet easily captured by standard hypothesis tests.

Keywords: Genome-wide association studies (GWAS); Genomics; Multiple hypothesis testing; Population genetics; Statistical genetics.

MeSH terms

Genome-Wide Association Study*
Humans
Polymorphism, Single Nucleotide*
Population Density
Transcriptome

Abstract

MeSH terms

Grants and funding