A new analysis tool for individual-level allele frequency for genomic studies

BMC Genomics. 2010 Jul 5:11:415. doi: 10.1186/1471-2164-11-415.

Abstract

Background: Allele frequency is one of the most important population indices and has been broadly applied to genetic/genomic studies. Estimation of allele frequency using genotypes is convenient but may lose data information and be sensitive to genotyping errors.

Results: This study utilizes a unified intensity-measuring approach to estimating individual-level allele frequencies for 1,104 and 1,270 samples genotyped with the single-nucleotide-polymorphism arrays of the Affymetrix Human Mapping 100K and 500K Sets, respectively. Allele frequencies of all samples are estimated and adjusted by coefficients of preferential amplification/hybridization (CPA), and large ethnicity-specific and cross-ethnicity databases of CPA and allele frequency are established. The results show that using the CPA significantly improves the accuracy of allele frequency estimates; moreover, this paramount factor is insensitive to the time of data acquisition, effect of laboratory site, type of gene chip, and phenotypic status. Based on accurate allele frequency estimates, analytic methods based on individual-level allele frequencies are developed and successfully applied to discover genomic patterns of allele frequencies, detect chromosomal abnormalities, classify sample groups, identify outlier samples, and estimate the purity of tumor samples. The methods are packaged into a new analysis tool, ALOHA (Allele-frequency/Loss-of-heterozygosity/Allele-imbalance).

Conclusions: This is the first time that these important genetic/genomic applications have been simultaneously conducted by the analyses of individual-level allele frequencies estimated by a unified intensity-measuring approach. We expect that additional practical applications for allele frequency analysis will be found. The developed databases and tools provide useful resources for human genome analysis via high-throughput single-nucleotide-polymorphism arrays. The ALOHA software was written in R and R GUI and can be downloaded at http://www.stat.sinica.edu.tw/hsinchou/genetics/aloha/ALOHA.htm.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Gene Frequency*
  • Genome, Human*
  • Genomics / methods*
  • Humans
  • Polymorphism, Single Nucleotide
  • Software