Evaluation of potential novel variations and their interactions related to bipolar disorders: analysis of genome-wide association study data

Cengizhan Acikel; Yesim Aydin Son; Cemil Celik; Husamettin Gul

doi:10.2147/NDT.S112558

Evaluation of potential novel variations and their interactions related to bipolar disorders: analysis of genome-wide association study data

Neuropsychiatr Dis Treat. 2016 Nov 24:12:2997-3004. doi: 10.2147/NDT.S112558. eCollection 2016.

Authors

Cengizhan Acikel¹, Yesim Aydin Son², Cemil Celik³, Husamettin Gul⁴

Affiliations

¹ Department of Biostatistics, Gulhane Military Medical Academy.
² Department of Health Informatics, Graduate School of Informatics, Middle East Technical University.
³ Department of Medical Psychiatry.
⁴ Department of Medical Informatics, Gulhane Military Medical Academy, Ankara, Turkey.

Abstract

Background: Multifactor dimensionality reduction (MDR) is a nonparametric approach that can be used to detect relevant interactions between single-nucleotide polymorphisms (SNPs). The aim of this study was to build the best genomic model based on SNP associations and to identify candidate polymorphisms that are the underlying molecular basis of the bipolar disorders.

Methods: This study was performed on Whole-Genome Association Study of Bipolar Disorder (dbGaP [database of Genotypes and Phenotypes] study accession number: phs000017.v3.p1) data. After preprocessing of the genotyping data, three classification-based data mining methods (ie, random forest, naïve Bayes, and k-nearest neighbor) were performed. Additionally, as a nonparametric, model-free approach, the MDR method was used to evaluate the SNP profiles. The validity of these methods was evaluated using true classification rate, recall (sensitivity), precision (positive predictive value), and F-measure.

Results: Random forests, naïve Bayes, and k-nearest neighbors identified 16, 13, and ten candidate SNPs, respectively. Surprisingly, the top six SNPs were reported by all three methods. Random forests and k-nearest neighbors were more successful than naïve Bayes, with recall values >0.95. On the other hand, MDR generated a model with comparable predictive performance based on five SNPs. Although different SNP profiles were identified in MDR compared to the classification-based models, all models mapped SNPs to the DOCK10 gene.

Conclusion: Three classification-based data mining approaches, random forests, naïve Bayes, and k-nearest neighbors, have prioritized similar SNP profiles as predictors of bipolar disorders, in contrast to MDR, which has found different SNPs through analysis of two-way and three-way interactions. The reduced number of associated SNPs discovered by MDR, without loss in the classification performance, would facilitate validation studies and decision support models, and would reduce the cost to develop predictive and diagnostic tests. Nevertheless, we need to emphasize that translation of genomic models to the clinical setting requires models with higher classification performance.

Keywords: Bipolar disorders; Data Mining; Decision Support; GWAS; MDR; SNP.

Abstract

Grants and funding