PVTree: A Sequential Pattern Mining Method for Alignment Independent Phylogeny Reconstruction

Genes (Basel). 2019 Jan 22;10(2):73. doi: 10.3390/genes10020073.

Abstract

Phylogenetic tree is essential to understand evolution and it is usually constructed through multiple sequence alignment, which suffers from heavy computational burdens and requires sophisticated parameter tuning. Recently, alignment free methods based on k-mer profiles or common substrings provide alternative ways to construct phylogenetic trees. However, most of these methods ignore the global similarities between sequences or some specific valuable features, e.g., frequent patterns overall datasets. To make further improvement, we propose an alignment free algorithm based on sequential pattern mining, where each sequence is converted into a binary representation of sequential patterns among sequences. The phylogenetic tree is further constructed via clustering distance matrix which is calculated from pattern vectors. To increase accuracy for highly divergent sequences, we consider pattern weight and filtering redundancy sub-patterns. Both simulated and real data demonstrates our method outperform other alignment free methods, especially for large sequence set with low similarity.

Keywords: alignment free; multiple sequence alignment; phylogenetic tree; sequential pattern mining.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Phylogeny*
  • Sequence Alignment / methods*
  • Sequence Alignment / standards
  • Software*