Comparison of INDEL Calling Tools with Simulation Data and Real Short-Read Data

IEEE/ACM Trans Comput Biol Bioinform. 2019 Sep-Oct;16(5):1635-1644. doi: 10.1109/TCBB.2018.2854793. Epub 2018 Jul 10.

Abstract

Insertions and deletions (INDELs) comprise a significant proportion of human genetic variation, and recent papers have revealed that many human diseases may be attributable to INDELs. With the development of next-generation sequencing (NGS) technology, many statistical/computational tools have been developed for calling INDELs. However, there are differences among those tools, and comparisons among them have been limited. In order to better understand these inter-tool differences, five popular and publicly available INDEL calling tools-GATK HaplotypeCaller, Platypus, VarScan2, Scalpel, and GotCloud-were evaluated using simulation data, 1000 Genomes Project data, and family-based sequencing data. The accuracy of INDEL calling by each tool was mainly evaluated by concordance rates. Family-based sequencing data, which consisted of 49 individuals from eight Korean families, were used to calculate Mendelian error rates. Our comparison results show that GATK HaplotypeCaller usually performs the best and that joint calling with Platypus can lead to additional improvements in accuracy. The result of this study provides important information regarding future directions for the variant detection and the algorithms development.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology
  • Computer Simulation
  • DNA Mutational Analysis* / methods
  • DNA Mutational Analysis* / standards
  • Databases, Genetic
  • High-Throughput Nucleotide Sequencing* / methods
  • High-Throughput Nucleotide Sequencing* / standards
  • Humans
  • INDEL Mutation / genetics*
  • Reproducibility of Results
  • Sequence Analysis, DNA* / methods
  • Sequence Analysis, DNA* / standards
  • Software*