Data simulation software for whole-genome association and other studies in human genetics

Pac Symp Biocomput. 2006:499-510.

Abstract

Genome-wide association studies have become a reality in the study of the genetics of complex disease. This technology provides a wealth of genomic information on patient samples, from which we hope to learn novel biology and detect important genetic and environmental factors for disease processes. Because strategies for analyzing these data have not kept pace with the laboratory methods that generate the data it is unlikely that these advances will immediately lead to an improved understanding of the genetic contribution to common human disease and drug response. Currently, no single analytical method will allow us to extract all information from a whole-genome association study. Thus, many novel methods are being proposed and developed. It will be vital for the success of these new methods, to have the ability to simulate datasets consisting of polymorphisms throughout the genome with realistic linkage disequilibrium patterns. Within these datasets, we can embed genetic models of disease whereby we can evaluate the ability of novel methods to detect these simulated effects. This paper describes a new software package, genomeSIM, for the simulation of large-scale genomic data in population based case-control samples. It allows for single SNP, as well as gene-gene interaction models to be associated with disease risk. We describe the algorithm and demonstrate its utility for future genetic studies of whole-genome association.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Alleles
  • Computational Biology
  • Computer Simulation
  • Databases, Genetic
  • Gene Frequency
  • Genome, Human
  • Genomics / statistics & numerical data*
  • Humans
  • Linkage Disequilibrium
  • Models, Genetic
  • Polymorphism, Single Nucleotide
  • Recombination, Genetic
  • Software*