The phenotype-genotype reference map: Improving biobank data science through replication

Am J Hum Genet. 2023 Sep 7;110(9):1522-1533. doi: 10.1016/j.ajhg.2023.07.012. Epub 2023 Aug 21.

Abstract

Population-scale biobanks linked to electronic health record data provide vast opportunities to extend our knowledge of human genetics and discover new phenotype-genotype associations. Given their dense phenotype data, biobanks can also facilitate replication studies on a phenome-wide scale. Here, we introduce the phenotype-genotype reference map (PGRM), a set of 5,879 genetic associations from 523 GWAS publications that can be used for high-throughput replication experiments. PGRM phenotypes are standardized as phecodes, ensuring interoperability between biobanks. We applied the PGRM to five ancestry-specific cohorts from four independent biobanks and found evidence of robust replications across a wide array of phenotypes. We show how the PGRM can be used to detect data corruption and to empirically assess parameters for phenome-wide studies. Finally, we use the PGRM to explore factors associated with replicability of GWAS results.

Keywords: GWAS; PheWAS; biobanks; data quality; electronic health records; phecodes; phenotyping; replication.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Biological Specimen Banks*
  • Data Science*
  • Genotype
  • Humans
  • Phenomics
  • Phenotype