Regulatory element-based prediction identifies new susceptibility regulatory variants for osteoporosis

Hum Genet. 2017 Aug;136(8):963-974. doi: 10.1007/s00439-017-1825-4. Epub 2017 Jun 20.

Abstract

Despite genome-wide association studies (GWASs) have identified many susceptibility genes for osteoporosis, it still leaves a large part of missing heritability to be discovered. Integrating regulatory information and GWASs could offer new insights into the biological link between the susceptibility SNPs and osteoporosis. We generated five machine learning classifiers with osteoporosis-associated variants and regulatory features data. We gained the optimal classifier and predicted genome-wide SNPs to discover susceptibility regulatory variants. We further utilized Genetic Factors for Osteoporosis Consortium (GEFOS) and three in-house GWASs samples to validate the associations for predicted positive SNPs. The random forest classifier performed best among all machine learning methods with the F1 score of 0.8871. Using the optimized model, we predicted 37,584 candidate SNPs for osteoporosis. According to the meta-analysis results, a list of regulatory variants was significantly associated with osteoporosis after multiple testing corrections and contributed to the expression of known osteoporosis-associated protein-coding genes. In summary, combining GWASs and regulatory elements through machine learning could provide additional information for understanding the mechanism of osteoporosis. The regulatory variants we predicted will provide novel targets for etiology research and treatment of osteoporosis.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Cell Line
  • Galanin / genetics
  • Galanin / metabolism
  • Gene Frequency
  • Genetic Predisposition to Disease
  • Genome-Wide Association Study
  • Humans
  • Models, Genetic
  • Osteoporosis / genetics*
  • Polymorphism, Single Nucleotide*
  • Regulatory Sequences, Nucleic Acid*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Separase / genetics
  • Separase / metabolism

Substances

  • GAL protein, human
  • Galanin
  • ESPL1 protein, human
  • Separase