Long read genome assemblers struggle with small plasmids

Microb Genom. 2023 May;9(5):mgen001024. doi: 10.1099/mgen.0.001024.

Abstract

Whole-genome sequencing has become a preferred method for studying bacterial plasmids, as it is generally assumed to capture the entire genome. However, long-read genome assemblers have been shown to sometimes miss plasmid sequences - an issue that has been associated with plasmid size. The purpose of this study was to investigate the relationship between plasmid size and plasmid recovery by the long-read-only assemblers, Flye, Raven, Miniasm, and Canu. This was accomplished by determining the number of times each assembler successfully recovered 33 plasmids, ranging from 1919 to 194 062 bp in size and belonging to 14 bacterial isolates from six bacterial genera, using Oxford Nanopore long reads. These results were additionally compared to plasmid recovery rates by the short-read-first assembler, Unicycler, using both Oxford Nanopore long reads and Illumina short reads. Results from this study indicate that Canu, Flye, Miniasm, and Raven are prone to missing plasmid sequences, whereas Unicycler was successful at recovering 100 % of plasmid sequences. Excluding Canu, most plasmid loss by long-read-only assemblers was due to failure to recover plasmids smaller than 10 kb. As such, it is recommended that Unicycler be used to increase the likelihood of plasmid recovery during bacterial genome assembly.

Keywords: Plasmids; hybrid genome assembly; long-read sequencing; whole-genome sequencing.

Publication types

  • Research Support, U.S. Gov't, P.H.S.
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Genome, Bacterial*
  • Nanopores*
  • Plasmids / genetics
  • Whole Genome Sequencing