A Machine Learning Method to Identify Genetic Variants Potentially Associated With Alzheimer's Disease

Front Genet. 2021 Jun 14:12:647436. doi: 10.3389/fgene.2021.647436. eCollection 2021.

Abstract

There is hope that genomic information will assist prediction, treatment, and understanding of Alzheimer's disease (AD). Here, using exome data from ∼10,000 individuals, we explore machine learning neural network (NN) methods to estimate the impact of SNPs (i.e., genetic variants) on AD risk. We develop an NN-based method (netSNP) that identifies hundreds of novel potentially protective or at-risk AD-associated SNPs (along with an effect measure); the majority with frequency under 0.01. For case individuals, the number of "protective" (or "at-risk") netSNP-identified SNPs in their genome correlates positively (or inversely) with their age of AD diagnosis and inversely (or positively) with autopsy neuropathology. The effect measure increases correlations. Simulations suggest our results are not due to genetic linkage, overfitting, or bias introduced by netSNP. These findings suggest that netSNP can identify SNPs associated with AD pathophysiology that may assist with the diagnosis and mechanistic understanding of the disease.

Keywords: Alzheimer’s; disease; machine learning; neural network; polygenic.