CANDID: a flexible method for prioritizing candidate genes for complex human traits

Genet Epidemiol. 2008 Dec;32(8):779-90. doi: 10.1002/gepi.20346.

Abstract

Genomewide studies and localized candidate gene approaches have become everyday study designs for identifying polymorphisms in genes that influence complex human traits. Yet, in general, the number of significant findings and the need to focus on smaller regions require a prioritization of genes for further study. Some candidate gene identification algorithms have been proposed in recent years to attempt to streamline this prioritization, but many suffer from limitations imposed by the source data or are difficult to use and understand. CANDID is a prioritization algorithm designed to produce impartial, accurate rankings of candidate genes that influence complex human traits. CANDID can use information from publications, protein domain descriptions, cross-species conservation measures, gene expression profiles and protein-protein interactions in its analysis. Additionally, users may supplement these data sources with results from linkage, association and other studies. CANDID was tested on well-known complex trait genes using data from the Online Mendelian Inheritance in Man database. Additionally, CANDID was evaluated in a modeled gene discovery environment, where it ranked genes whose trait associations were published after CANDID's databases were compiled. In all settings, CANDID exhibited high sensitivity and specificity, indicating an improvement upon previously published algorithms. Its accuracy and ease of use make CANDID a highly useful tool in study design and analysis for complex human traits.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Databases, Genetic
  • Female
  • Gene Expression Profiling
  • Genetic Diseases, Inborn / genetics*
  • Genetic Linkage
  • Genetic Techniques*
  • Genome, Human
  • Humans
  • Male
  • Models, Genetic
  • Models, Statistical
  • Protein Structure, Tertiary
  • ROC Curve