Comparison of genotype clustering tools with rare variants

BMC Bioinformatics. 2014 Feb 21:15:52. doi: 10.1186/1471-2105-15-52.

Abstract

Background: Along with the improvement of high throughput sequencing technologies, the genetics community is showing marked interest for the rare variants/common diseases hypothesis. While sequencing can still be prohibitive for large studies, commercially available genotyping arrays targeting rare variants prove to be a reasonable alternative. A technical challenge of array based methods is the task of deriving genotype classes (homozygous or heterozygous) by clustering intensity data points. The performance of clustering tools for common polymorphisms is well established, while their performance when conducted with a large proportion of rare variants (where data points are sparse for genotypes containing the rare allele) is less known. We have compared the performance of four clustering tools (GenCall, GenoSNP, optiCall and zCall) for the genotyping of over 10,000 samples using the Illumina's HumanExome BeadChip, which includes 247,870 variants, 90% of which have a minor allele frequency below 5% in a population of European ancestry. Different reference parameters for GenCall and different initial parameters for GenoSNP were tested. Genotyping accuracy was assessed using data from the 1000 Genomes Project as a gold standard, and agreement between tools was measured.

Results: Concordance of GenoSNP's calls with the gold standard was below expectations and was increased by changing the tool's initial parameters. While the four tools provided concordance with the gold standard above 99% for common alleles, some of them performed poorly for rare alleles. The reproducibility of genotype calls for each tool was assessed using experimental duplicates which provided concordance rates above 99%. The inter-tool agreement of genotype calls was high for approximately 95% of variants. Most tools yielded similar error rates (approximately 0.02), except for zCall which performed better with a 0.00164 mean error rate.

Conclusions: The GenoSNP clustering tool could not be run straight "out of the box" with the HumanExome BeadChip, as modification of hard coded parameters was necessary to achieve optimal performance. Overall, GenCall marginally outperformed the other tools for the HumanExome BeadChip. The use of experimental replicates provided a valuable quality control tool for genotyping projects with rare variants.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cluster Analysis*
  • Gene Frequency
  • Genome / genetics
  • Genomics / methods*
  • Genotype*
  • Humans
  • Reproducibility of Results
  • Sequence Analysis, DNA / methods*