Long Branch Attraction (LBA) is a well-known artifact in phylogenetic reconstruction when dealing with branch length heterogeneity. Here we show another phenomenon, Short Branch Attraction (SBA), which occurs when BLAST searches, a phenetic analysis, are used as a surrogate method for phylogenetic analysis. This error also results from branch length heterogeneity, but this time it is the short branches that are attracting. The SBA artifact is reciprocal and can be returned 100% of the time when multiple branches differ in length by a factor of more than two. SBA is an intended feature of BLAST searches, but becomes an issue, when top scoring BLAST hit analyses are used to infer Horizontal Gene Transfers (HGTs), assign taxonomic category with environmental sequence data in phylotyping, or gather homologous sequences for building gene families. SBA can lead researchers to believe that there has been a HGT event when only vertical descent has occurred, cause slowly evolving taxa to be over-represented and quickly evolving taxa to be under-represented in phylotyping, or systematically exclude quickly evolving taxa from analyses. SBA also contributes to the changing results of top scoring BLAST hit analyses as the database grows, because more slowly evolving taxa, or short branches, are added over time, introducing more potential for SBA. SBA can be detected by examining reciprocal best BLAST hits among a larger group of taxa, including the known closest phylogenetic neighbors. Therefore, one should look for this phenomenon when conducting best BLAST hit analyses as a surrogate method to identify HGTs, in phylotyping, or when using BLAST to gather homologous sequences.
Keywords: Branch length space; Gene transfer detection; HGT; LBA; Long Branch Attraction; Phylotyping; Short Branch Attraction; Top scoring BLAST hit.
Copyright © 2016 Elsevier Inc. All rights reserved.