NextPolish: a fast and efficient genome polishing tool for long-read assembly

Jiang Hu; Junpeng Fan; Zongyi Sun; Shanlin Liu

doi:10.1093/bioinformatics/btz891

NextPolish: a fast and efficient genome polishing tool for long-read assembly

Bioinformatics. 2020 Apr 1;36(7):2253-2255. doi: 10.1093/bioinformatics/btz891.

Authors

Jiang Hu¹, Junpeng Fan¹, Zongyi Sun¹, Shanlin Liu¹

Affiliation

¹ GrandOmics Biosciences, Beijing, 102200, China.

PMID: 31778144
DOI: 10.1093/bioinformatics/btz891

Abstract

Motivation: Although long-read sequencing technologies can produce genomes with long contiguity, they suffer from high error rates. Thus, we developed NextPolish, a tool that efficiently corrects sequence errors in genomes assembled with long reads. This new tool consists of two interlinked modules that are designed to score and count K-mers from high quality short reads, and to polish genome assemblies containing large numbers of base errors.

Results: When evaluated for the speed and efficiency using human and a plant (Arabidopsis thaliana) genomes, NextPolish outperformed Pilon by correcting sequence errors faster, and with a higher correction accuracy.

Availability and implementation: NextPolish is implemented in C and Python. The source code is available from https://github.com/Nextomics/NextPolish.

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

Algorithms*
Genome
High-Throughput Nucleotide Sequencing*
Humans
Poland
Sequence Analysis, DNA
Software