Ranking of non-coding pathogenic variants and putative essential regions of the human genome

Nat Commun. 2019 Nov 20;10(1):5241. doi: 10.1038/s41467-019-13212-3.

Abstract

A gene is considered essential if loss of function results in loss of viability, fitness or in disease. This concept is well established for coding genes; however, non-coding regions are thought less likely to be determinants of critical functions. Here we train a machine learning model using functional, mutational and structural features, including new genome essentiality metrics, 3D genome organization and enhancer reporter data to identify deleterious variants in non-coding regions. We assess the model for functional correlates by using data from tiling-deletion-based and CRISPR interference screens of activity of cis-regulatory elements in over 3 Mb of genome sequence. Finally, we explore two user cases that involve indels and the disruption of enhancers associated with a developmental disease. We rank variants in the non-coding genome according to their predicted deleteriousness. The model prioritizes non-coding regions associated with regulation of important genes and with cell viability, an in vitro surrogate of essentiality.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chromatin
  • Computer Simulation
  • DNA
  • Enhancer Elements, Genetic
  • Gene Expression
  • Genes, Reporter
  • Genetic Variation / genetics*
  • Genome, Human / genetics*
  • Humans
  • INDEL Mutation
  • Introns / genetics*
  • Mutation
  • Nucleic Acid Conformation
  • Supervised Machine Learning*

Substances

  • Chromatin
  • DNA