REPRODUCIBLE AND SHAREABLE QUANTIFICATIONS OF PATHOGENICITY

Pac Symp Biocomput. 2016:21:231-42.

Abstract

There are now hundreds of thousands of pathogenicity assertions that relate genetic variation to disease, but most of this clinically utilized variation has no accepted quantitative disease risk estimate. Recent disease-specific studies have used control sequence data to reclassify large amounts of prior pathogenic variation, but there is a critical need to scale up both the pace and feasibility of such pathogenicity reassessments across human disease. In this manuscript we develop a shareable computational framework to quantify pathogenicity assertions. We release a reproducible "digital notebook" that integrates executable code, text annotations, and mathematical expressions in a freely accessible statistical environment. We extend previous disease-specific pathogenicity assessments to over 6,000 diseases and 160,000 assertions in the ClinVar database. Investigators can use this platform to prioritize variants for reassessment and tailor genetic model parameters (such as prevalence and heterogeneity) to expose the uncertainty underlying pathogenicity-based risk assessments. Finally, we release a website that links users to pathogenic variation for a queried disease, supporting literature, and implied disease risk calculations subject to user-defined and disease-specific genetic risk models in order to facilitate variant reassessments.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods
  • Computational Biology / statistics & numerical data
  • Databases, Genetic / statistics & numerical data
  • Disease / genetics
  • Exome / genetics
  • Gene Frequency
  • Genetic Association Studies / statistics & numerical data
  • Genetic Variation
  • Genome, Human
  • Humans
  • Models, Genetic
  • Reproducibility of Results
  • Risk Factors
  • Software
  • Virulence / genetics*