MegaSNPHunter: a learning approach to detect disease predisposition SNPs and high level interactions in genome wide association study

BMC Bioinformatics. 2009 Jan 9:10:13. doi: 10.1186/1471-2105-10-13.

Abstract

Background: The interactions of multiple single nucleotide polymorphisms (SNPs) are highly hypothesized to affect an individual's susceptibility to complex diseases. Although many works have been done to identify and quantify the importance of multi-SNP interactions, few of them could handle the genome wide data due to the combinatorial explosive search space and the difficulty to statistically evaluate the high-order interactions given limited samples.

Results: Three comparative experiments are designed to evaluate the performance of MegaSNPHunter. The first experiment uses synthetic data generated on the basis of epistasis models. The second one uses a genome wide study on Parkinson disease (data acquired by using Illumina HumanHap300 SNP chips). The third one chooses the rheumatoid arthritis study from Wellcome Trust Case Control Consortium (WTCCC) using Affymetrix GeneChip 500K Mapping Array Set. MegaSNPHunter outperforms the best solution in this area and reports many potential interactions for the two real studies.

Conclusion: The experimental results on both synthetic data and two real data sets demonstrate that our proposed approach outperforms the best solution that is currently available in handling large-scale SNP data both in terms of speed and in terms of detection of potential interactions that were not identified before. To our knowledge, MegaSNPHunter is the first approach that is capable of identifying the disease-associated SNP interactions from WTCCC studies and is promising for practical disease prognosis.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Arthritis, Rheumatoid / genetics
  • Artificial Intelligence
  • Chi-Square Distribution
  • Computational Biology / methods*
  • Computer Simulation
  • Epistasis, Genetic
  • Genetic Predisposition to Disease*
  • Genome-Wide Association Study / methods*
  • Humans
  • Models, Genetic
  • Oligonucleotide Array Sequence Analysis
  • Parkinson Disease / genetics
  • Polymorphism, Single Nucleotide*
  • Reproducibility of Results