Prediction of Coronary Artery Disease Risk Using Genetic and Phenotypic Variables

Stud Health Technol Inform. 2024 Jan 25:310:1021-1025. doi: 10.3233/SHTI231119.

Abstract

Coronary artery disease (CAD) has the highest disease burden worldwide. To manage this burden, predictive models are required to screen patients for preventative treatment. A range of variables have been explored for their capacity to predict disease, including phenotypic (age, sex, BMI and smoking status), medical imaging (carotid artery thickness) and genotypic. We use a machine learning models and the UK Biobank cohort to measure the prediction capacity of these 3 variable categories, both in combination and isolation. We demonstrate that phenotypic variables from the Framingham risk score have the best prediction capacity, although a combination of phenotypic, medical imaging and genotypic variables deliver the most specific models. Furthermore, we demonstrate that Variant Spark, a random forest based GWAS platform, performs effective feature selection for SNP-based genotype variables, identifying 115 significantly associated SNPs to the CAD phenotype.

Keywords: Cardiovascular disease; disease risk prediction; machine learning.

MeSH terms

  • Carotid Intima-Media Thickness
  • Coronary Artery Disease* / diagnostic imaging
  • Coronary Artery Disease* / genetics
  • Genotype
  • Humans
  • Machine Learning
  • Phenotype