Identification of probable genotyping errors by consideration of haplotypes

Eur J Hum Genet. 2006 Apr;14(4):450-8. doi: 10.1038/sj.ejhg.5201565.

Abstract

Undetected genotyping errors pose a problem in genetic epidemiological studies, as they may invalidate statistical analysis or reduce its power. Haplotype analysis requires an improved standard of the data, because a haplotype can be inferred correctly only if the genotypes of all its markers are correct. Here, we present a method that identifies probable genotyping errors in trio samples with the help of the estimated haplotype frequency distribution of the sample. If the likelihood of the most likely haplotype explanation depends strongly on just one genotype, in the sense that setting the genotype to be missing leads to a much more likely haplotype explanation, this genotype is considered as a potential genotyping error. We describe a method that systematically searches the whole data set for such potential errors. Based on the haplotype distribution of a real data set, we carry out a simulation study to estimate the sensitivity and specificity of the method. In addition, we apply our approach to the real data set itself. Potentially erroneous genotypes are re-determined via sequencing. The results of both the simulation study and of the application to the real data set show that a considerable proportion of true genotyping errors is detected and that the number of false-positive signals is acceptable. We conclude that it is indeed possible to identify probable genotyping errors by considering haplotypes. The method described here will be part of the next release of our FAMHAP software.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computer Simulation*
  • Gene Frequency
  • Genetic Markers / genetics
  • Genotype
  • Haplotypes*
  • Humans
  • Models, Genetic*
  • Predictive Value of Tests
  • Research Design*

Substances

  • Genetic Markers