Algorithmic improvements for discovery of germline copy number variants in next-generation sequencing data

BMC Bioinformatics. 2022 Jul 19;23(1):285. doi: 10.1186/s12859-022-04820-w.

Abstract

Background: Copy number variants (CNVs) play a significant role in human heredity and disease. However, sensitive and specific characterization of germline CNVs from NGS data has remained challenging, particularly for hybridization-capture data in which read counts are the primary source of copy number information.

Results: We describe two algorithmic adaptations that improve CNV detection accuracy in a Hidden Markov Model (HMM) context. First, we present a method for computing target- and copy number-specific emission distributions. Second, we demonstrate that the Pointwise Maximum a posteriori (PMAP) HMM decoding procedure yields improved sensitivity for small CNV calls compared to the more common Viterbi HMM decoder. We develop a prototype implementation, called Cobalt, and compare it to other CNV detection tools using sets of simulated and previously detected CNVs with sizes spanning a single exon to a full chromosome.

Conclusions: In both the simulation and previously detected CNV studies Cobalt shows similar sensitivity but significantly fewer false positive detections compared to other callers. Overall sensitivity is 80-90% for deletion CNVs spanning 1-4 targets and 90-100% for larger deletion events, while sensitivity is somewhat lower for small duplication CNVs.

Keywords: Copy number variants (CNV); Hidden Markov model; Next generation sequencing; Whole exome sequencing.

MeSH terms

  • Algorithms
  • Computer Simulation
  • DNA Copy Number Variations*
  • Exons
  • Germ Cells
  • High-Throughput Nucleotide Sequencing* / methods
  • Humans