Selecting models of nucleotide substitution: an application to human immunodeficiency virus 1 (HIV-1)

Mol Biol Evol. 2001 Jun;18(6):897-906. doi: 10.1093/oxfordjournals.molbev.a003890.

Abstract

The blind use of models of nucleotide substitution in evolutionary analyses is a common practice in the viral community. Typically, a simple model of evolution like the Kimura two-parameter model is used for estimating genetic distances and phylogenies, either because other authors have used it or because it is the default in various phylogenetic packages. Using two statistical approaches to model fitting, hierarchical likelihood ratio tests and the Akaike information criterion, we show that different viral data sets are better explained by different models of evolution. We demonstrate our results with the analysis of HIV-1 sequences from a hierarchy of samples; sequences within individuals, individuals within subtypes, and subtypes within groups. We also examine results for three different gene regions: gag, pol, and env. The Kimura two-parameter model was not selected as the best-fit model for any of these data sets, despite its widespread use in phylogenetic analyses of HIV-1 sequences. Furthermore, the model complexity increased with increasing sequence divergence. Finally, the molecular-clock hypothesis was rejected in most of the data sets analyzed, throwing into question clock-based estimates of divergence times for HIV-1. The importance of models in evolutionary analyses and their repercussions on the derived conclusions are discussed.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Databases, Factual
  • Evolution, Molecular
  • HIV-1 / genetics*
  • Humans
  • Models, Genetic*
  • Phylogeny*
  • Point Mutation
  • Polymorphism, Genetic