From next-generation sequencing alignments to accurate comparison and validation of single-nucleotide variants: the pibase software

Nucleic Acids Res. 2013 Jan 7;41(1):e16. doi: 10.1093/nar/gks836. Epub 2012 Sep 10.

Abstract

Scientists working with single-nucleotide variants (SNVs), inferred by next-generation sequencing software, often need further information regarding true variants, artifacts and sequence coverage gaps. In clinical diagnostics, e.g. SNVs must usually be validated by visual inspection or several independent SNV-callers. We here demonstrate that 0.5-60% of relevant SNVs might not be detected due to coverage gaps, or might be misidentified. Even low error rates can overwhelm the true biological signal, especially in clinical diagnostics, in research comparing healthy with affected cells, in archaeogenetic dating or in forensics. For these reasons, we have developed a package called pibase, which is applicable to diploid and haploid genome, exome or targeted enrichment data. pibase extracts details on nucleotides from alignment files at user-specified coordinates and identifies reproducible genotypes, if present. In test cases pibase identifies genotypes at 99.98% specificity, 10-fold better than other tools. pibase also provides pair-wise comparisons between healthy and affected cells using nucleotide signals (10-fold more accurately than a genotype-based approach, as we show in our case study of monozygotic twins). This comparison tool also solves the problem of detecting allelic imbalance within heterozygous SNVs in copy number variation loci, or in heterogeneous tumor sequences.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Genetic Variation*
  • Genomics
  • High-Throughput Nucleotide Sequencing*
  • Humans
  • Phylogeny
  • Reproducibility of Results
  • Sequence Alignment*
  • Sequence Analysis, DNA*
  • Software*
  • Twins, Monozygotic / genetics