Candidate disease gene prediction using Gentrepid: application to a genome-wide association study on coronary artery disease

Mol Genet Genomic Med. 2014 Jan;2(1):44-57. doi: 10.1002/mgg3.40. Epub 2013 Nov 13.

Abstract

Current single-locus-based analyses and candidate disease gene prediction methodologies used in genome-wide association studies (GWAS) do not capitalize on the wealth of the underlying genetic data, nor functional data available from molecular biology. Here, we analyzed GWAS data from the Wellcome Trust Case Control Consortium (WTCCC) on coronary artery disease (CAD). Gentrepid uses a multiple-locus-based approach, drawing on protein pathway- or domain-based data to make predictions. Known disease genes may be used as additional information (seeded method) or predictions can be based entirely on GWAS single nucleotide polymorphisms (SNPs) (ab initio method). We looked in detail at specific predictions made by Gentrepid for CAD and compared these with known genetic data and the scientific literature. Gentrepid was able to extract known disease genes from the candidate search space and predict plausible novel disease genes from both known and novel WTCCC-implicated loci. The disease gene candidates are consistent with known biological information. The results demonstrate that this computational approach is feasible and a valuable discovery tool for geneticists.

Keywords: Candidate gene prediction; WTCCC; cis-ruption, complex diseases; coronary artery disease; genome-wide association study; miRNA, Wellcome Trust Case Control Consortium.